You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2019/11/06 01:46:04 UTC

[VOTE] AIP-21 update for Airflow 1.10.* backportability

Hello Airflow Community,

The email calls for a vote to update AIP-21 Changes in import paths
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths>
with
the changes described below. The vote will last till Saturday 8th 2am CEST
(72 hours). Committers have a binding vote but everyone from the community
is encouraged to cast an advisory vote.

*Summary*:

The proposal is to update AIP-21 to move all non-core
operators/hooks/sensor (and related files) to sub-packages within airflow
(protocols/software/providers) or (software/providers).
I am also happy to merge protocols+software, so if you have a strong
opinion on it - please state it with your vote and we can decide based on
majority.

Those packages will be separately released (schedule/process TBD) and will
be backportable to 1.10.* airflow series, so that users can install it and
start using new Airflow2.0 operators in their Python 3 Airflow 1.10
environments (only Python 3.5+ is supported).

We will proceed with migrating the providers package to already agreed
paths without waiting for the final vote (following current version of
AIP-21). Since we have working POC - we know the agreed paths will work for
us.

*Previous discussions: *

   -
   https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
   -
   https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E

*More Details*:

1) Information that we are going in the direction of AIP-8 but not yet
reaching it - focusing on separating out backportable packages installable
in Airflow releases 1.10.* . Airflow 2.0 will still be installed as a whole
and all the source will be kept in one repo, but we now have a way to build
backportable packages for groups of operators. POC available here:
https://github.com/apache/airflow/pull/6507 (based on Ash's
https://github.com/ashb/airflow-submodule-test)

2) We move all integrations to new packages (keeping deprecated import
aliases in the old places). The following split (according to "stewardship"
over the integrations):

   - *fundamentals* - core of ariflow - they are really part of Apache
   Airflow. Stewards - core Airflow team. Not backportable/separated out.
   - *protocols* - are not owned by anyone, they are public and the
   implementation is fully "open". There are no particular stewards (no need).
   Users of particular protocols should mainly maintain those and add support
   for different versions of the protocols.
   - *software* - both API and software are controlled by someone outside
   of Airflow (commercial or open-source project), but the deployment of that
   software is "owned" by the user installing Airflow. The "stewardship" might
   be also the users but the controlling party (Oracle for example) might be
   interested in maintaining those operators as well.
   - *providers* - API/software/deployments are fully controlled by a 3rd
   party. Here most likely "provider" will be interested in maintaining the
   operators (and for example like Google - provide integration guidelines
   <https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978>
for
   their hooks/operators/sensors)


3) Between-providers transfer operators should be kept at the "target"
rather than "source"
For example S3 -> GCS should be in "google" provider, but GCS-> S3 should
be in "amazon".

4) One-side provider transfer operators should be kept at the "provider"
regardless if they are target or source.
For example GCS-> SFTP or SFTP -> GCS should be in "google" provider.

5) If in doubt we will discuss individual cases separately.

J.

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
>
> Another question is operators like SlackWebHookOperator depends on
> SimpleHTTPOperator ! Will this cause dependencies issues or with proper
> versioning this should be OK ?
>

Very good question Kaxil! This is one of the reasons we do not want to make
yet full AIP-8 implementation. There will be dependencies between the
packages (including pip dependencies) that will make it difficult to have
them managed fully independently.
In this version of the AIP-21 proposal, all the operators in 2.0 will still
be released together with main airflow (as it was done for Airflow 1.10)
from one repository.

We have not yet discussed versioning scheme for backporting packages. I
think we can decide on it later, separately, how exactly we name those
versions. But I think we will make a single "snapshot" of all packages
moved to "providers" (for backporting purpose) and release them together.
They will have single version and cross-dependencies between the packages -
if we find they are needed. For example we could add dependency "slack" ->
"http" while we build the package.

In this AIP-21 backporting scenario, we only have to worry about matching
full set of pip dependencies between the backport releases and few latest
1.10.* released versions. This should be doable and testable by installing
the backport packages with recent Airflow releases. We can automate this
:).

The best thing is - this whole exercise with backporting will help us to
learn about all such dependencies (and also about pip dependencies). In the
POC https://github.com/apache/airflow/pull/6507 you can see that for every
package we can have separate dependency set defined (for example google
package depends on 'gcp' extra). We can even have different set of
constraints if we find that certain backport packages need to have some
additional limits on pip versions.

While we do the exercise and have a backport releases and learn from that
we can make much better decisions that might lead eventually to AIP-8.

J.

On Mon, Nov 11, 2019 at 3:22 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
> >  One more question. Are you sure you want to move Python and Bash from
> > core?  These are the elements that are installed in every environment
> > because they are required by Airflow, so moving them to a separate
> > installed package is pointless in my opinion.
> >
> > On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > > I am fine with this list +1
> > >
> > > On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > I am all for it Kamil!
> > > >
> > > > Super happy to treat Apache projects in the same way as "proprietary"
> > > > providers :). Anyone else has some other comments ?
> > > >
> > > > J.
> > > >
> > > > On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <
> > kamil.bregula@polidea.com>
> > > > wrote:
> > > >
> > > > > I looked at this list and I'm only worried about two operators.
> > > > >
> > > > > airflow.contrib.operators.vertica_to_hive
> > > > > airflow.contrib.operators.s3_to_hive
> > > > >
> > > > > If we want the operators to be grouped according to destination,
> then
> > > > > this operator should be in apache package. It is the members of the
> > > > > Apache community who will care most about this operator being of
> high
> > > > > quality. Apache can be treated equally with other large cloud
> > > > > providers, such as GCP, AWS. I can imagine that a new Apache
> product
> > > > > will appear and it will want to promote the same way as products of
> > > > > cloud providers are promoted. By creating a large number of
> > > > > integrations that allow you to copy data to its operating range.
> > > > > There's another cases - building a strong Apache community. As a
> > > > > member of the Apache community, we should promote Apache products
> to
> > > > > ensure that the development of the community is correct, and
> > therefore
> > > > > also for integration into our products with other products.
> > > > >
> > > > > On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > > wrote:
> > > > > >
> > > > > > Just to select the "packages" for this update. Anyone has
> > objections
> > > > for
> > > > > > this structure (details including transfer operators in
> > > > > >
> > > > > > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> > > > > > Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> > > > > >
> > > > > > *Fundamentals (no change)*
> > > > > >
> > > > > >
> > > > > >
> > > > > > providers
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > google
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > cloud
> > > > > >
> > > > > >
> > > > > >
> > > > > > gsuite
> > > > > >
> > > > > >
> > > > > >
> > > > > > marketing_platform
> > > > > >
> > > > > >
> > > > > > amazon
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > aws
> > > > > >
> > > > > >
> > > > > > microsoft
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > azure
> > > > > >
> > > > > >
> > > > > > apache
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > cassandra
> > > > > >
> > > > > >
> > > > > >
> > > > > > druid
> > > > > >
> > > > > >
> > > > > >
> > > > > > hadoop
> > > > > >
> > > > > >
> > > > > >
> > > > > > hive
> > > > > >
> > > > > >
> > > > > >
> > > > > > pig
> > > > > >
> > > > > >
> > > > > >
> > > > > > pinot
> > > > > >
> > > > > >
> > > > > >
> > > > > > spark
> > > > > >
> > > > > >
> > > > > >
> > > > > > sqoop
> > > > > >
> > > > > >
> > > > > > mysql
> > > > > >
> > > > > >
> > > > > >
> > > > > > jira
> > > > > >
> > > > > >
> > > > > >
> > > > > > databricks
> > > > > >
> > > > > >
> > > > > >
> > > > > > datadog
> > > > > >
> > > > > >
> > > > > >
> > > > > > dingding
> > > > > >
> > > > > >
> > > > > >
> > > > > > discord
> > > > > >
> > > > > >
> > > > > >
> > > > > > cloudant
> > > > > >
> > > > > >
> > > > > >
> > > > > > jenkins
> > > > > >
> > > > > >
> > > > > >
> > > > > > opsgenie
> > > > > >
> > > > > >
> > > > > >
> > > > > > qubole
> > > > > >
> > > > > >
> > > > > >
> > > > > > salesforce
> > > > > >
> > > > > >
> > > > > >
> > > > > > segment
> > > > > >
> > > > > >
> > > > > >
> > > > > > slack
> > > > > >
> > > > > >
> > > > > >
> > > > > > snowflake
> > > > > >
> > > > > >
> > > > > >
> > > > > > vertica
> > > > > >
> > > > > >
> > > > > >
> > > > > > zendesk
> > > > > >
> > > > > >
> > > > > >
> > > > > > celery
> > > > > >
> > > > > >
> > > > > >
> > > > > > docker
> > > > > >
> > > > > >
> > > > > >
> > > > > > bash
> > > > > >
> > > > > >
> > > > > >
> > > > > > kubernetes
> > > > > >
> > > > > >
> > > > > >
> > > > > > mssql
> > > > > >
> > > > > >
> > > > > >
> > > > > > mongodb
> > > > > >
> > > > > >
> > > > > >
> > > > > > mysql
> > > > > >
> > > > > >
> > > > > >
> > > > > > openfaas
> > > > > >
> > > > > >
> > > > > >
> > > > > > oracle
> > > > > >
> > > > > >
> > > > > >
> > > > > > papermill
> > > > > >
> > > > > >
> > > > > >
> > > > > > postgres
> > > > > >
> > > > > >
> > > > > >
> > > > > > presto
> > > > > >
> > > > > >
> > > > > >
> > > > > > python
> > > > > >
> > > > > >
> > > > > >
> > > > > > redis
> > > > > >
> > > > > >
> > > > > >
> > > > > > samba
> > > > > >
> > > > > >
> > > > > >
> > > > > > sqlite
> > > > > >
> > > > > >
> > > > > >
> > > > > > imap
> > > > > >
> > > > > >
> > > > > >
> > > > > > ssh
> > > > > >
> > > > > >
> > > > > >
> > > > > > filesystem
> > > > > >
> > > > > >
> > > > > >
> > > > > > sftp
> > > > > >
> > > > > >
> > > > > >
> > > > > > ftp
> > > > > >
> > > > > >
> > > > > >
> > > > > > http
> > > > > >
> > > > > >
> > > > > >
> > > > > > grpc
> > > > > >
> > > > > >
> > > > > >
> > > > > > smtp
> > > > > >
> > > > > >
> > > > > >
> > > > > > jdbc
> > > > > >
> > > > > >
> > > > > >
> > > > > > winrm
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Let me then cancel this vote and I will restart it next week.
> > > > > > >
> > > > > > > Yeah. It's a bit like re-opening the Pandora's box but now that
> > we
> > > > know
> > > > > > > that we can do it, and we are unblocked in moving to google
> > (which is
> > > > > now
> > > > > > > the biggest move in-progress),  we can spend more time on
> getting
> > > > > better
> > > > > > > (and more final) consensus.
> > > > > > > I decided to go through the list from the docs (once again
> Kamil
> > -
> > > > > great
> > > > > > > that you did it) and prepared this spreadsheet showing the
> > > > structure. I
> > > > > > > went through ALL the operators and put them in the right place
> > where
> > > > > our
> > > > > > > current rules place them.
> > > > > > >
> > > > > > > After this exercise, I think that makes sense:
> > > > > > > - put all the stuff except fundamentals in *"providers"*
> > (everything
> > > > > > > in "providers" will be potentially backportable).
> > > > > > > - grouping apache projects under *"apache"* - similar to
> > > > > > > google/amazon/microsoft (different kind of ownership but still
> > it is
> > > > an
> > > > > > > ownership)
> > > > > > > - for the rest I think what we can do is really to put the
> > operators
> > > > in
> > > > > > > folders per "service/company" (without sub-packages). That
> > includes
> > > > > > > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and
> > sftp]
> > > > ??).
> > > > > > > there is no "ownership" there and no reason to group them. That
> > will
> > > > > put
> > > > > > > "operators/hooks/sensors" at different levels in the directory
> > tree
> > > > > but we
> > > > > > > already have that for fundamentals and I am not too worried
> about
> > > > > that. We
> > > > > > > do not have to have everything at the same level.
> > > > > > > - I put transfer operators according to the rule where "to"
> side
> > is
> > > > > more
> > > > > > > important unless the other side is a public protocol (so sftp
> ->
> > gcs
> > > > > and
> > > > > > > gcs -> sftp both go to google/gcp). I did not have any doubt
> > where to
> > > > > put
> > > > > > > which transfer operator, so this is a good sign:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> > > > > > >
> > > > > > > Can you please take a look and express your opinions here so
> > that we
> > > > > can
> > > > > > > have final voting next week (for those who are not yet tired
> > with the
> > > > > > > discussion ;)).
> > > > > > >
> > > > > > > J.
> > > > > > >
> > > > > > > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <kaxilnaik@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> Yes, that makes sense.
> > > > > > >>
> > > > > > >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> > > > > kamil.bregula@polidea.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > In the case of Hadoop, it is published by Apache, so it can
> > be in
> > > > > the
> > > > > > >> > apache directory.  This will mimic the grouping presented in
> > the
> > > > > > >> > documentation.
> > > > > > >> >
> > > > > > >>
> > > > >
> > > >
> >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> > > > > > >> >
> > > > > > >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <
> > kaxilnaik@gmail.com>
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > I think we should keep the vote open at least until mid
> next
> > > > week
> > > > > to
> > > > > > >> have
> > > > > > >> > > more thought and inputs on this one.
> > > > > > >> > >
> > > > > > >> > > In general, I am happy with the approach but
> > operators/hooks and
> > > > > > >> sensors
> > > > > > >> > > shouldn't be a provider. "hadoop" can be its provider and
> > hdfs
> > > > > can be
> > > > > > >> a
> > > > > > >> > > part of it.
> > > > > > >> > >
> > > > > > >> > > providers/
> > > > > > >> > >     google
> > > > > > >> > >          cloud
> > > > > > >> > >              operators
> > > > > > >> > >              hooks
> > > > > > >> > >              sensors
> > > > > > >> > >          gsuite
> > > > > > >> > >              operators
> > > > > > >> > >              ...
> > > > > > >> > >     amazon
> > > > > > >> > >          aws
> > > > > > >> > >              operators
> > > > > > >> > >              ...
> > > > > > >> > >     microsoft
> > > > > > >> > >          azure
> > > > > > >> > >              operators
> > > > > > >> > >              ...
> > > > > > >> > >     hadoop
> > > > > > >> > >         hdfs
> > > > > > >> > >              operators
> > > > > > >> > >              ...
> > > > > > >> > >
> > > > > > >> > > We can also define what is a "provider" so we know what to
> > add
> > > > in
> > > > > it
> > > > > > >> in
> > > > > > >> > the
> > > > > > >> > > future. SSH/FTP/SFTP belongs to the same family group. Do
> we
> > > > want
> > > > > to
> > > > > > >> have
> > > > > > >> > > separate providers for each one of them ???
> > > > > > >> > >
> > > > > > >> > > Regards,
> > > > > > >> > > Kaxil
> > > > > > >> > >
> > > > > > >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> > > > > Jarek.Potiuk@polidea.com
> > > > > > >> >
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > I really like to make everything a provider. That's a
> > great
> > > > > idea !
> > > > > > >> > This way
> > > > > > >> > > > everything "backportable" will have to be in "providers"
> > > > > package.
> > > > > > >> > Really
> > > > > > >> > > > nice and clean separation (and less mess in "airflow").
> > And we
> > > > > will
> > > > > > >> not
> > > > > > >> > > > have to have any artificial grouping (we can still group
> > them
> > > > > at the
> > > > > > >> > > > documentation level).
> > > > > > >> > > >
> > > > > > >> > > > We do not need backport in name. And I think it's more
> of
> > > > > technical
> > > > > > >> > detail
> > > > > > >> > > > on naming the package which we can work out while
> > reviewing
> > > > PRs
> > > > > and
> > > > > > >> we
> > > > > > >> > can
> > > > > > >> > > > agree final naming of the released packaged on PMC level
> > (PMCs
> > > > > will
> > > > > > >> > have to
> > > > > > >> > > > vote on releasing those).
> > > > > > >> > > >
> > > > > > >> > > > The thinking is that it's intention is really to be only
> > > > > backported
> > > > > > >> to
> > > > > > >> > 1.10
> > > > > > >> > > > - we are not going (yet) to use the packages in Airflow
> > 2.*.
> > > > so
> > > > > I
> > > > > > >> > thought
> > > > > > >> > > > by naming them backport we can express that intent more
> > > > clearly.
> > > > > > >> > > >
> > > > > > >> > > > So let me clarify the structure of folders we are going
> to
> > > > have
> > > > > if
> > > > > > >> we
> > > > > > >> > > > follow it (i just added some examples) including the
> > already
> > > > > agreed
> > > > > > >> > changes
> > > > > > >> > > > from AIP-21:
> > > > > > >> > > >
> > > > > > >> > > > providers/
> > > > > > >> > > >     google
> > > > > > >> > > >          cloud
> > > > > > >> > > >              operators
> > > > > > >> > > >              hooks
> > > > > > >> > > >              sensors
> > > > > > >> > > >          gsuite
> > > > > > >> > > >              operators
> > > > > > >> > > >              ...
> > > > > > >> > > >     amazon
> > > > > > >> > > >          aws
> > > > > > >> > > >              operators
> > > > > > >> > > >              ...
> > > > > > >> > > >     microsoft
> > > > > > >> > > >          azure
> > > > > > >> > > >              operators
> > > > > > >> > > >              ...
> > > > > > >> > > >     operators
> > > > > > >> > > >          sqlite.py
> > > > > > >> > > >          oracle.py
> > > > > > >> > > >          docker.py
> > > > > > >> > > >     hooks
> > > > > > >> > > >          hdfs.py
> > > > > > >> > > >          sqlite.py
> > > > > > >> > > >     sensors
> > > > > > >> > > >          http.py
> > > > > > >> > > >          sql.py
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > J.
> > > > > > >> > > >
> > > > > > >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> > > > > ash@apache.org>
> > > > > > >> > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Do we need to include `-backport,`? What was the
> > thinking
> > > > > behind
> > > > > > >> > that?
> > > > > > >> > > > >
> > > > > > >> > > > > I think software and protocol should be merged. I
> would
> > also
> > > > > say
> > > > > > >> > > > > _everything_ is a provider, so
> > > > > airflow.providers.ssh.SSHOperator
> > > > > > >> for
> > > > > > >> > > > > instance is what I would prefer
> > > > > > >> > > > >
> > > > > > >> > > > > -a
> > > > > > >> > > > >
> > > > > > >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > > > > > >> > Jarek.Potiuk@polidea.com>
> > > > > > >> > > > > wrote:
> > > > > > >> > > > > >One more day to go. I would love to see some opinions
> > on
> > > > this
> > > > > > >> AIP-21
> > > > > > >> > > > > >update
> > > > > > >> > > > > >:).
> > > > > > >> > > > > >
> > > > > > >> > > > > >Executive summary:
> > > > > > >> > > > > >
> > > > > > >> > > > > >* we will be moving a number of integrations to
> > > > sub-packages
> > > > > of
> > > > > > >> > > > > >airflow.
> > > > > > >> > > > > >* they will be backportable to 1.10.*.  There will be
> > > > > > >> > > > > >'apache-airflow-[package]-backport' pypi installable
> > with
> > > > > python
> > > > > > >> 3
> > > > > > >> > that
> > > > > > >> > > > > >will make Airflow 2.0 operators/hooks etc. available
> > with
> > > > > 1.10*
> > > > > > >> > > > > >operators.
> > > > > > >> > > > > >* the current proposal for sub-packages is
> > > > > > >> > > > > >"protocols/software/providers/"
> > > > > > >> > > > > >(but if you think merging protocols and software
> makes
> > > > sense
> > > > > -
> > > > > > >> > please
> > > > > > >> > > > > >express your opinion
> > > > > > >> > > > > >* we are not moving "fundamental" operators/hooks
> etc..
> > > > > > >> > > > > >* Airflow 2.0 is still going to be installed as a
> > single
> > > > > package
> > > > > > >> > with
> > > > > > >> > > > > >all
> > > > > > >> > > > > >operators (so we are not yet implementing AIP-8)
> > > > > > >> > > > > >
> > > > > > >> > > > > >J.
> > > > > > >> > > > > >
> > > > > > >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > > > > > >> > Jarek.Potiuk@polidea.com>
> > > > > > >> > > > > >wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > >> I think all this cases are valid but maybe I was
> not
> > > > > > >> super-clear.
> > > > > > >> > > > > >It's
> > > > > > >> > > > > >> only the transfer operators that we need to decide
> > where
> > > > to
> > > > > > >> put -
> > > > > > >> > not
> > > > > > >> > > > > >> hooks.
> > > > > > >> > > > > >> Usually the complexity of communication with
> > particular
> > > > > > >> storages
> > > > > > >> > is
> > > > > > >> > > > > >(or at
> > > > > > >> > > > > >> least should be) in the Hooks rather than
> Operators.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Operators should be just thin wrappers over the
> > logic in
> > > > > the
> > > > > > >> > hooks.
> > > > > > >> > > > > >> Hooks are going to stay where they belong - S3
> Hooks
> > in
> > > > > amazon,
> > > > > > >> > GCS
> > > > > > >> > > > > >Hooks
> > > > > > >> > > > > >> in google.cloud, GoogleSheet Hooks in
> google.gsuite.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Since we actually have mono-repo - this will be no
> > > > problem
> > > > > > >> (and no
> > > > > > >> > > > > >cross
> > > > > > >> > > > > >> dependencies problem) to have S3 -> GCS operator
> in
> > > > > google and
> > > > > > >> > use
> > > > > > >> > > > > >hooks
> > > > > > >> > > > > >> from both google/amazon.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> I hope this alleviates your concern Daniel ?
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> J.
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?
> > These
> > > > > you
> > > > > > >> would
> > > > > > >> > > > > >put in
> > > > > > >> > > > > >>> the target, i.e. the storage?  But
> > GoogleSheetsToSftp
> > > > > would
> > > > > > >> be in
> > > > > > >> > > > > >google
> > > > > > >> > > > > >>> sheets operators file?  The complexity, and the
> > shared
> > > > > code,
> > > > > > >> are
> > > > > > >> > in
> > > > > > >> > > > > >the
> > > > > > >> > > > > >>> gsheet component -- not into the storage
> > destination.
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > > > >> > > > > ><Ja...@polidea.com>
> > > > > > >> > > > > >>> wrote:
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >>> > Hello Airflow Community,
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > The email calls for a vote to update AIP-21
> > Changes in
> > > > > > >> import
> > > > > > >> > > > > >paths
> > > > > > >> > > > > >>> > <
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > > > >> > > > > >>> > >
> > > > > > >> > > > > >>> > with
> > > > > > >> > > > > >>> > the changes described below. The vote will last
> > till
> > > > > > >> Saturday
> > > > > > >> > 8th
> > > > > > >> > > > > >2am
> > > > > > >> > > > > >>> CEST
> > > > > > >> > > > > >>> > (72 hours). Committers have a binding vote but
> > > > everyone
> > > > > from
> > > > > > >> > the
> > > > > > >> > > > > >>> community
> > > > > > >> > > > > >>> > is encouraged to cast an advisory vote.
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > *Summary*:
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > The proposal is to update AIP-21 to move all
> > non-core
> > > > > > >> > > > > >>> > operators/hooks/sensor (and related files) to
> > > > > sub-packages
> > > > > > >> > within
> > > > > > >> > > > > >>> airflow
> > > > > > >> > > > > >>> > (protocols/software/providers) or
> > > > (software/providers).
> > > > > > >> > > > > >>> > I am also happy to merge protocols+software, so
> > if you
> > > > > have
> > > > > > >> a
> > > > > > >> > > > > >strong
> > > > > > >> > > > > >>> > opinion on it - please state it with your vote
> > and we
> > > > > can
> > > > > > >> > decide
> > > > > > >> > > > > >based
> > > > > > >> > > > > >>> on
> > > > > > >> > > > > >>> > majority.
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > Those packages will be separately released
> > > > > (schedule/process
> > > > > > >> > TBD)
> > > > > > >> > > > > >and
> > > > > > >> > > > > >>> will
> > > > > > >> > > > > >>> > be backportable to 1.10.* airflow series, so
> that
> > > > users
> > > > > can
> > > > > > >> > > > > >install it
> > > > > > >> > > > > >>> and
> > > > > > >> > > > > >>> > start using new Airflow2.0 operators in their
> > Python 3
> > > > > > >> Airflow
> > > > > > >> > > > > >1.10
> > > > > > >> > > > > >>> > environments (only Python 3.5+ is supported).
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > We will proceed with migrating the providers
> > package
> > > > to
> > > > > > >> already
> > > > > > >> > > > > >agreed
> > > > > > >> > > > > >>> > paths without waiting for the final vote
> > (following
> > > > > current
> > > > > > >> > > > > >version of
> > > > > > >> > > > > >>> > AIP-21). Since we have working POC - we know the
> > > > agreed
> > > > > > >> paths
> > > > > > >> > will
> > > > > > >> > > > > >work
> > > > > > >> > > > > >>> for
> > > > > > >> > > > > >>> > us.
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > *Previous discussions: *
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> >    -
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > > > >> > > > > >>> >    -
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > *More Details*:
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > 1) Information that we are going in the
> direction
> > of
> > > > > AIP-8
> > > > > > >> but
> > > > > > >> > not
> > > > > > >> > > > > >yet
> > > > > > >> > > > > >>> > reaching it - focusing on separating out
> > backportable
> > > > > > >> packages
> > > > > > >> > > > > >>> installable
> > > > > > >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will
> > still be
> > > > > > >> > installed
> > > > > > >> > > > > >as a
> > > > > > >> > > > > >>> whole
> > > > > > >> > > > > >>> > and all the source will be kept in one repo, but
> > we
> > > > now
> > > > > > >> have a
> > > > > > >> > way
> > > > > > >> > > > > >to
> > > > > > >> > > > > >>> build
> > > > > > >> > > > > >>> > backportable packages for groups of operators.
> POC
> > > > > available
> > > > > > >> > here:
> > > > > > >> > > > > >>> > https://github.com/apache/airflow/pull/6507
> > (based on
> > > > > Ash's
> > > > > > >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > 2) We move all integrations to new packages
> > (keeping
> > > > > > >> deprecated
> > > > > > >> > > > > >import
> > > > > > >> > > > > >>> > aliases in the old places). The following split
> > > > > (according
> > > > > > >> to
> > > > > > >> > > > > >>> "stewardship"
> > > > > > >> > > > > >>> > over the integrations):
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> >    - *fundamentals* - core of ariflow - they are
> > > > really
> > > > > > >> part of
> > > > > > >> > > > > >Apache
> > > > > > >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > > > >> > > > > >backportable/separated
> > > > > > >> > > > > >>> out.
> > > > > > >> > > > > >>> >    - *protocols* - are not owned by anyone, they
> > are
> > > > > public
> > > > > > >> and
> > > > > > >> > > > > >the
> > > > > > >> > > > > >>> >    implementation is fully "open". There are no
> > > > > particular
> > > > > > >> > > > > >stewards (no
> > > > > > >> > > > > >>> > need).
> > > > > > >> > > > > >>> >    Users of particular protocols should mainly
> > > > maintain
> > > > > > >> those
> > > > > > >> > and
> > > > > > >> > > > > >add
> > > > > > >> > > > > >>> > support
> > > > > > >> > > > > >>> >    for different versions of the protocols.
> > > > > > >> > > > > >>> >    - *software* - both API and software are
> > controlled
> > > > > by
> > > > > > >> > someone
> > > > > > >> > > > > >>> outside
> > > > > > >> > > > > >>> >    of Airflow (commercial or open-source
> > project), but
> > > > > the
> > > > > > >> > > > > >deployment of
> > > > > > >> > > > > >>> > that
> > > > > > >> > > > > >>> >    software is "owned" by the user installing
> > Airflow.
> > > > > The
> > > > > > >> > > > > >"stewardship"
> > > > > > >> > > > > >>> > might
> > > > > > >> > > > > >>> >    be also the users but the controlling party
> > (Oracle
> > > > > for
> > > > > > >> > > > > >example)
> > > > > > >> > > > > >>> might
> > > > > > >> > > > > >>> > be
> > > > > > >> > > > > >>> >    interested in maintaining those operators as
> > well.
> > > > > > >> > > > > >>> >    - *providers* - API/software/deployments are
> > fully
> > > > > > >> > controlled
> > > > > > >> > > > > >by a
> > > > > > >> > > > > >>> 3rd
> > > > > > >> > > > > >>> >    party. Here most likely "provider" will be
> > > > > interested in
> > > > > > >> > > > > >maintaining
> > > > > > >> > > > > >>> the
> > > > > > >> > > > > >>> >    operators (and for example like Google -
> > provide
> > > > > > >> integration
> > > > > > >> > > > > >>> guidelines
> > > > > > >> > > > > >>> >    <
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > > > >> > > > > >>> > >
> > > > > > >> > > > > >>> > for
> > > > > > >> > > > > >>> >    their hooks/operators/sensors)
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > 3) Between-providers transfer operators should
> be
> > kept
> > > > > at
> > > > > > >> the
> > > > > > >> > > > > >"target"
> > > > > > >> > > > > >>> > rather than "source"
> > > > > > >> > > > > >>> > For example S3 -> GCS should be in "google"
> > provider,
> > > > > but
> > > > > > >> > GCS-> S3
> > > > > > >> > > > > >>> should
> > > > > > >> > > > > >>> > be in "amazon".
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > 4) One-side provider transfer operators should
> be
> > kept
> > > > > at
> > > > > > >> the
> > > > > > >> > > > > >"provider"
> > > > > > >> > > > > >>> > regardless if they are target or source.
> > > > > > >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be
> in
> > > > > "google"
> > > > > > >> > > > > >provider.
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > 5) If in doubt we will discuss individual cases
> > > > > separately.
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > J.
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > --
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > Jarek Potiuk
> > > > > > >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal
> > > > Software
> > > > > > >> > Engineer
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > > > >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > > > >> > > > > >>> >
> > > > > > >> > > > > >>>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> --
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> Jarek Potiuk
> > > > > > >> > > > > >> Polidea <https://www.polidea.com/> | Principal
> > Software
> > > > > > >> Engineer
> > > > > > >> > > > > >>
> > > > > > >> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > > >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >>
> > > > > > >> > > > > >
> > > > > > >> > > > > >--
> > > > > > >> > > > > >
> > > > > > >> > > > > >Jarek Potiuk
> > > > > > >> > > > > >Polidea <https://www.polidea.com/> | Principal
> > Software
> > > > > Engineer
> > > > > > >> > > > > >
> > > > > > >> > > > > >M: +48 660 796 129 <+48660796129>
> > > > > > >> > > > > >[image: Polidea] <https://www.polidea.com/>
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > >
> > > > > > >> > > > Jarek Potiuk
> > > > > > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > > > Engineer
> > > > > > >> > > >
> > > > > > >> > > > M: +48 660 796 129 <+48660796129>
> > > > > > >> > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > >> > > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Jarek Potiuk
> > > > > > > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > > >
> > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kaxil Naik <ka...@gmail.com>.
Another question is operators like SlackWebHookOperator depends on
SimpleHTTPOperator ! Will this cause dependencies issues or with proper
versioning this should be OK ?

On Mon, Nov 11, 2019 at 3:22 PM Kamil Breguła <ka...@polidea.com>
wrote:

>  One more question. Are you sure you want to move Python and Bash from
> core?  These are the elements that are installed in every environment
> because they are required by Airflow, so moving them to a separate
> installed package is pointless in my opinion.
>
> On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > I am fine with this list +1
> >
> > On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > I am all for it Kamil!
> > >
> > > Super happy to treat Apache projects in the same way as "proprietary"
> > > providers :). Anyone else has some other comments ?
> > >
> > > J.
> > >
> > > On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <
> kamil.bregula@polidea.com>
> > > wrote:
> > >
> > > > I looked at this list and I'm only worried about two operators.
> > > >
> > > > airflow.contrib.operators.vertica_to_hive
> > > > airflow.contrib.operators.s3_to_hive
> > > >
> > > > If we want the operators to be grouped according to destination, then
> > > > this operator should be in apache package. It is the members of the
> > > > Apache community who will care most about this operator being of high
> > > > quality. Apache can be treated equally with other large cloud
> > > > providers, such as GCP, AWS. I can imagine that a new Apache product
> > > > will appear and it will want to promote the same way as products of
> > > > cloud providers are promoted. By creating a large number of
> > > > integrations that allow you to copy data to its operating range.
> > > > There's another cases - building a strong Apache community. As a
> > > > member of the Apache community, we should promote Apache products to
> > > > ensure that the development of the community is correct, and
> therefore
> > > > also for integration into our products with other products.
> > > >
> > > > On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > > >
> > > > > Just to select the "packages" for this update. Anyone has
> objections
> > > for
> > > > > this structure (details including transfer operators in
> > > > >
> > > > > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> > > > > Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> > > > >
> > > > > *Fundamentals (no change)*
> > > > >
> > > > >
> > > > >
> > > > > providers
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > google
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > cloud
> > > > >
> > > > >
> > > > >
> > > > > gsuite
> > > > >
> > > > >
> > > > >
> > > > > marketing_platform
> > > > >
> > > > >
> > > > > amazon
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > aws
> > > > >
> > > > >
> > > > > microsoft
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > azure
> > > > >
> > > > >
> > > > > apache
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > cassandra
> > > > >
> > > > >
> > > > >
> > > > > druid
> > > > >
> > > > >
> > > > >
> > > > > hadoop
> > > > >
> > > > >
> > > > >
> > > > > hive
> > > > >
> > > > >
> > > > >
> > > > > pig
> > > > >
> > > > >
> > > > >
> > > > > pinot
> > > > >
> > > > >
> > > > >
> > > > > spark
> > > > >
> > > > >
> > > > >
> > > > > sqoop
> > > > >
> > > > >
> > > > > mysql
> > > > >
> > > > >
> > > > >
> > > > > jira
> > > > >
> > > > >
> > > > >
> > > > > databricks
> > > > >
> > > > >
> > > > >
> > > > > datadog
> > > > >
> > > > >
> > > > >
> > > > > dingding
> > > > >
> > > > >
> > > > >
> > > > > discord
> > > > >
> > > > >
> > > > >
> > > > > cloudant
> > > > >
> > > > >
> > > > >
> > > > > jenkins
> > > > >
> > > > >
> > > > >
> > > > > opsgenie
> > > > >
> > > > >
> > > > >
> > > > > qubole
> > > > >
> > > > >
> > > > >
> > > > > salesforce
> > > > >
> > > > >
> > > > >
> > > > > segment
> > > > >
> > > > >
> > > > >
> > > > > slack
> > > > >
> > > > >
> > > > >
> > > > > snowflake
> > > > >
> > > > >
> > > > >
> > > > > vertica
> > > > >
> > > > >
> > > > >
> > > > > zendesk
> > > > >
> > > > >
> > > > >
> > > > > celery
> > > > >
> > > > >
> > > > >
> > > > > docker
> > > > >
> > > > >
> > > > >
> > > > > bash
> > > > >
> > > > >
> > > > >
> > > > > kubernetes
> > > > >
> > > > >
> > > > >
> > > > > mssql
> > > > >
> > > > >
> > > > >
> > > > > mongodb
> > > > >
> > > > >
> > > > >
> > > > > mysql
> > > > >
> > > > >
> > > > >
> > > > > openfaas
> > > > >
> > > > >
> > > > >
> > > > > oracle
> > > > >
> > > > >
> > > > >
> > > > > papermill
> > > > >
> > > > >
> > > > >
> > > > > postgres
> > > > >
> > > > >
> > > > >
> > > > > presto
> > > > >
> > > > >
> > > > >
> > > > > python
> > > > >
> > > > >
> > > > >
> > > > > redis
> > > > >
> > > > >
> > > > >
> > > > > samba
> > > > >
> > > > >
> > > > >
> > > > > sqlite
> > > > >
> > > > >
> > > > >
> > > > > imap
> > > > >
> > > > >
> > > > >
> > > > > ssh
> > > > >
> > > > >
> > > > >
> > > > > filesystem
> > > > >
> > > > >
> > > > >
> > > > > sftp
> > > > >
> > > > >
> > > > >
> > > > > ftp
> > > > >
> > > > >
> > > > >
> > > > > http
> > > > >
> > > > >
> > > > >
> > > > > grpc
> > > > >
> > > > >
> > > > >
> > > > > smtp
> > > > >
> > > > >
> > > > >
> > > > > jdbc
> > > > >
> > > > >
> > > > >
> > > > > winrm
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > > wrote:
> > > > >
> > > > > > Let me then cancel this vote and I will restart it next week.
> > > > > >
> > > > > > Yeah. It's a bit like re-opening the Pandora's box but now that
> we
> > > know
> > > > > > that we can do it, and we are unblocked in moving to google
> (which is
> > > > now
> > > > > > the biggest move in-progress),  we can spend more time on getting
> > > > better
> > > > > > (and more final) consensus.
> > > > > > I decided to go through the list from the docs (once again Kamil
> -
> > > > great
> > > > > > that you did it) and prepared this spreadsheet showing the
> > > structure. I
> > > > > > went through ALL the operators and put them in the right place
> where
> > > > our
> > > > > > current rules place them.
> > > > > >
> > > > > > After this exercise, I think that makes sense:
> > > > > > - put all the stuff except fundamentals in *"providers"*
> (everything
> > > > > > in "providers" will be potentially backportable).
> > > > > > - grouping apache projects under *"apache"* - similar to
> > > > > > google/amazon/microsoft (different kind of ownership but still
> it is
> > > an
> > > > > > ownership)
> > > > > > - for the rest I think what we can do is really to put the
> operators
> > > in
> > > > > > folders per "service/company" (without sub-packages). That
> includes
> > > > > > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and
> sftp]
> > > ??).
> > > > > > there is no "ownership" there and no reason to group them. That
> will
> > > > put
> > > > > > "operators/hooks/sensors" at different levels in the directory
> tree
> > > > but we
> > > > > > already have that for fundamentals and I am not too worried about
> > > > that. We
> > > > > > do not have to have everything at the same level.
> > > > > > - I put transfer operators according to the rule where "to" side
> is
> > > > more
> > > > > > important unless the other side is a public protocol (so sftp ->
> gcs
> > > > and
> > > > > > gcs -> sftp both go to google/gcp). I did not have any doubt
> where to
> > > > put
> > > > > > which transfer operator, so this is a good sign:
> > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> > > > > >
> > > > > > Can you please take a look and express your opinions here so
> that we
> > > > can
> > > > > > have final voting next week (for those who are not yet tired
> with the
> > > > > > discussion ;)).
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Yes, that makes sense.
> > > > > >>
> > > > > >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> > > > kamil.bregula@polidea.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > In the case of Hadoop, it is published by Apache, so it can
> be in
> > > > the
> > > > > >> > apache directory.  This will mimic the grouping presented in
> the
> > > > > >> > documentation.
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> > > > > >> >
> > > > > >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > wrote:
> > > > > >> > >
> > > > > >> > > I think we should keep the vote open at least until mid next
> > > week
> > > > to
> > > > > >> have
> > > > > >> > > more thought and inputs on this one.
> > > > > >> > >
> > > > > >> > > In general, I am happy with the approach but
> operators/hooks and
> > > > > >> sensors
> > > > > >> > > shouldn't be a provider. "hadoop" can be its provider and
> hdfs
> > > > can be
> > > > > >> a
> > > > > >> > > part of it.
> > > > > >> > >
> > > > > >> > > providers/
> > > > > >> > >     google
> > > > > >> > >          cloud
> > > > > >> > >              operators
> > > > > >> > >              hooks
> > > > > >> > >              sensors
> > > > > >> > >          gsuite
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >     amazon
> > > > > >> > >          aws
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >     microsoft
> > > > > >> > >          azure
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >     hadoop
> > > > > >> > >         hdfs
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >
> > > > > >> > > We can also define what is a "provider" so we know what to
> add
> > > in
> > > > it
> > > > > >> in
> > > > > >> > the
> > > > > >> > > future. SSH/FTP/SFTP belongs to the same family group. Do we
> > > want
> > > > to
> > > > > >> have
> > > > > >> > > separate providers for each one of them ???
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Kaxil
> > > > > >> > >
> > > > > >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > I really like to make everything a provider. That's a
> great
> > > > idea !
> > > > > >> > This way
> > > > > >> > > > everything "backportable" will have to be in "providers"
> > > > package.
> > > > > >> > Really
> > > > > >> > > > nice and clean separation (and less mess in "airflow").
> And we
> > > > will
> > > > > >> not
> > > > > >> > > > have to have any artificial grouping (we can still group
> them
> > > > at the
> > > > > >> > > > documentation level).
> > > > > >> > > >
> > > > > >> > > > We do not need backport in name. And I think it's more of
> > > > technical
> > > > > >> > detail
> > > > > >> > > > on naming the package which we can work out while
> reviewing
> > > PRs
> > > > and
> > > > > >> we
> > > > > >> > can
> > > > > >> > > > agree final naming of the released packaged on PMC level
> (PMCs
> > > > will
> > > > > >> > have to
> > > > > >> > > > vote on releasing those).
> > > > > >> > > >
> > > > > >> > > > The thinking is that it's intention is really to be only
> > > > backported
> > > > > >> to
> > > > > >> > 1.10
> > > > > >> > > > - we are not going (yet) to use the packages in Airflow
> 2.*.
> > > so
> > > > I
> > > > > >> > thought
> > > > > >> > > > by naming them backport we can express that intent more
> > > clearly.
> > > > > >> > > >
> > > > > >> > > > So let me clarify the structure of folders we are going to
> > > have
> > > > if
> > > > > >> we
> > > > > >> > > > follow it (i just added some examples) including the
> already
> > > > agreed
> > > > > >> > changes
> > > > > >> > > > from AIP-21:
> > > > > >> > > >
> > > > > >> > > > providers/
> > > > > >> > > >     google
> > > > > >> > > >          cloud
> > > > > >> > > >              operators
> > > > > >> > > >              hooks
> > > > > >> > > >              sensors
> > > > > >> > > >          gsuite
> > > > > >> > > >              operators
> > > > > >> > > >              ...
> > > > > >> > > >     amazon
> > > > > >> > > >          aws
> > > > > >> > > >              operators
> > > > > >> > > >              ...
> > > > > >> > > >     microsoft
> > > > > >> > > >          azure
> > > > > >> > > >              operators
> > > > > >> > > >              ...
> > > > > >> > > >     operators
> > > > > >> > > >          sqlite.py
> > > > > >> > > >          oracle.py
> > > > > >> > > >          docker.py
> > > > > >> > > >     hooks
> > > > > >> > > >          hdfs.py
> > > > > >> > > >          sqlite.py
> > > > > >> > > >     sensors
> > > > > >> > > >          http.py
> > > > > >> > > >          sql.py
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > J.
> > > > > >> > > >
> > > > > >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> > > > ash@apache.org>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Do we need to include `-backport,`? What was the
> thinking
> > > > behind
> > > > > >> > that?
> > > > > >> > > > >
> > > > > >> > > > > I think software and protocol should be merged. I would
> also
> > > > say
> > > > > >> > > > > _everything_ is a provider, so
> > > > airflow.providers.ssh.SSHOperator
> > > > > >> for
> > > > > >> > > > > instance is what I would prefer
> > > > > >> > > > >
> > > > > >> > > > > -a
> > > > > >> > > > >
> > > > > >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > > > > >> > Jarek.Potiuk@polidea.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > > >One more day to go. I would love to see some opinions
> on
> > > this
> > > > > >> AIP-21
> > > > > >> > > > > >update
> > > > > >> > > > > >:).
> > > > > >> > > > > >
> > > > > >> > > > > >Executive summary:
> > > > > >> > > > > >
> > > > > >> > > > > >* we will be moving a number of integrations to
> > > sub-packages
> > > > of
> > > > > >> > > > > >airflow.
> > > > > >> > > > > >* they will be backportable to 1.10.*.  There will be
> > > > > >> > > > > >'apache-airflow-[package]-backport' pypi installable
> with
> > > > python
> > > > > >> 3
> > > > > >> > that
> > > > > >> > > > > >will make Airflow 2.0 operators/hooks etc. available
> with
> > > > 1.10*
> > > > > >> > > > > >operators.
> > > > > >> > > > > >* the current proposal for sub-packages is
> > > > > >> > > > > >"protocols/software/providers/"
> > > > > >> > > > > >(but if you think merging protocols and software makes
> > > sense
> > > > -
> > > > > >> > please
> > > > > >> > > > > >express your opinion
> > > > > >> > > > > >* we are not moving "fundamental" operators/hooks etc..
> > > > > >> > > > > >* Airflow 2.0 is still going to be installed as a
> single
> > > > package
> > > > > >> > with
> > > > > >> > > > > >all
> > > > > >> > > > > >operators (so we are not yet implementing AIP-8)
> > > > > >> > > > > >
> > > > > >> > > > > >J.
> > > > > >> > > > > >
> > > > > >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > > > > >> > Jarek.Potiuk@polidea.com>
> > > > > >> > > > > >wrote:
> > > > > >> > > > > >
> > > > > >> > > > > >> I think all this cases are valid but maybe I was not
> > > > > >> super-clear.
> > > > > >> > > > > >It's
> > > > > >> > > > > >> only the transfer operators that we need to decide
> where
> > > to
> > > > > >> put -
> > > > > >> > not
> > > > > >> > > > > >> hooks.
> > > > > >> > > > > >> Usually the complexity of communication with
> particular
> > > > > >> storages
> > > > > >> > is
> > > > > >> > > > > >(or at
> > > > > >> > > > > >> least should be) in the Hooks rather than Operators.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Operators should be just thin wrappers over the
> logic in
> > > > the
> > > > > >> > hooks.
> > > > > >> > > > > >> Hooks are going to stay where they belong - S3 Hooks
> in
> > > > amazon,
> > > > > >> > GCS
> > > > > >> > > > > >Hooks
> > > > > >> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Since we actually have mono-repo - this will be no
> > > problem
> > > > > >> (and no
> > > > > >> > > > > >cross
> > > > > >> > > > > >> dependencies problem) to have S3 -> GCS operator  in
> > > > google and
> > > > > >> > use
> > > > > >> > > > > >hooks
> > > > > >> > > > > >> from both google/amazon.
> > > > > >> > > > > >>
> > > > > >> > > > > >> I hope this alleviates your concern Daniel ?
> > > > > >> > > > > >>
> > > > > >> > > > > >> J.
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?
> These
> > > > you
> > > > > >> would
> > > > > >> > > > > >put in
> > > > > >> > > > > >>> the target, i.e. the storage?  But
> GoogleSheetsToSftp
> > > > would
> > > > > >> be in
> > > > > >> > > > > >google
> > > > > >> > > > > >>> sheets operators file?  The complexity, and the
> shared
> > > > code,
> > > > > >> are
> > > > > >> > in
> > > > > >> > > > > >the
> > > > > >> > > > > >>> gsheet component -- not into the storage
> destination.
> > > > > >> > > > > >>>
> > > > > >> > > > > >>>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > > >> > > > > ><Ja...@polidea.com>
> > > > > >> > > > > >>> wrote:
> > > > > >> > > > > >>>
> > > > > >> > > > > >>> > Hello Airflow Community,
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > The email calls for a vote to update AIP-21
> Changes in
> > > > > >> import
> > > > > >> > > > > >paths
> > > > > >> > > > > >>> > <
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > > >> > > > > >>> > >
> > > > > >> > > > > >>> > with
> > > > > >> > > > > >>> > the changes described below. The vote will last
> till
> > > > > >> Saturday
> > > > > >> > 8th
> > > > > >> > > > > >2am
> > > > > >> > > > > >>> CEST
> > > > > >> > > > > >>> > (72 hours). Committers have a binding vote but
> > > everyone
> > > > from
> > > > > >> > the
> > > > > >> > > > > >>> community
> > > > > >> > > > > >>> > is encouraged to cast an advisory vote.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > *Summary*:
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > The proposal is to update AIP-21 to move all
> non-core
> > > > > >> > > > > >>> > operators/hooks/sensor (and related files) to
> > > > sub-packages
> > > > > >> > within
> > > > > >> > > > > >>> airflow
> > > > > >> > > > > >>> > (protocols/software/providers) or
> > > (software/providers).
> > > > > >> > > > > >>> > I am also happy to merge protocols+software, so
> if you
> > > > have
> > > > > >> a
> > > > > >> > > > > >strong
> > > > > >> > > > > >>> > opinion on it - please state it with your vote
> and we
> > > > can
> > > > > >> > decide
> > > > > >> > > > > >based
> > > > > >> > > > > >>> on
> > > > > >> > > > > >>> > majority.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > Those packages will be separately released
> > > > (schedule/process
> > > > > >> > TBD)
> > > > > >> > > > > >and
> > > > > >> > > > > >>> will
> > > > > >> > > > > >>> > be backportable to 1.10.* airflow series, so that
> > > users
> > > > can
> > > > > >> > > > > >install it
> > > > > >> > > > > >>> and
> > > > > >> > > > > >>> > start using new Airflow2.0 operators in their
> Python 3
> > > > > >> Airflow
> > > > > >> > > > > >1.10
> > > > > >> > > > > >>> > environments (only Python 3.5+ is supported).
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > We will proceed with migrating the providers
> package
> > > to
> > > > > >> already
> > > > > >> > > > > >agreed
> > > > > >> > > > > >>> > paths without waiting for the final vote
> (following
> > > > current
> > > > > >> > > > > >version of
> > > > > >> > > > > >>> > AIP-21). Since we have working POC - we know the
> > > agreed
> > > > > >> paths
> > > > > >> > will
> > > > > >> > > > > >work
> > > > > >> > > > > >>> for
> > > > > >> > > > > >>> > us.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > *Previous discussions: *
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >    -
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > > >> > > > > >>> >    -
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > *More Details*:
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 1) Information that we are going in the direction
> of
> > > > AIP-8
> > > > > >> but
> > > > > >> > not
> > > > > >> > > > > >yet
> > > > > >> > > > > >>> > reaching it - focusing on separating out
> backportable
> > > > > >> packages
> > > > > >> > > > > >>> installable
> > > > > >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will
> still be
> > > > > >> > installed
> > > > > >> > > > > >as a
> > > > > >> > > > > >>> whole
> > > > > >> > > > > >>> > and all the source will be kept in one repo, but
> we
> > > now
> > > > > >> have a
> > > > > >> > way
> > > > > >> > > > > >to
> > > > > >> > > > > >>> build
> > > > > >> > > > > >>> > backportable packages for groups of operators. POC
> > > > available
> > > > > >> > here:
> > > > > >> > > > > >>> > https://github.com/apache/airflow/pull/6507
> (based on
> > > > Ash's
> > > > > >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 2) We move all integrations to new packages
> (keeping
> > > > > >> deprecated
> > > > > >> > > > > >import
> > > > > >> > > > > >>> > aliases in the old places). The following split
> > > > (according
> > > > > >> to
> > > > > >> > > > > >>> "stewardship"
> > > > > >> > > > > >>> > over the integrations):
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >    - *fundamentals* - core of ariflow - they are
> > > really
> > > > > >> part of
> > > > > >> > > > > >Apache
> > > > > >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > > >> > > > > >backportable/separated
> > > > > >> > > > > >>> out.
> > > > > >> > > > > >>> >    - *protocols* - are not owned by anyone, they
> are
> > > > public
> > > > > >> and
> > > > > >> > > > > >the
> > > > > >> > > > > >>> >    implementation is fully "open". There are no
> > > > particular
> > > > > >> > > > > >stewards (no
> > > > > >> > > > > >>> > need).
> > > > > >> > > > > >>> >    Users of particular protocols should mainly
> > > maintain
> > > > > >> those
> > > > > >> > and
> > > > > >> > > > > >add
> > > > > >> > > > > >>> > support
> > > > > >> > > > > >>> >    for different versions of the protocols.
> > > > > >> > > > > >>> >    - *software* - both API and software are
> controlled
> > > > by
> > > > > >> > someone
> > > > > >> > > > > >>> outside
> > > > > >> > > > > >>> >    of Airflow (commercial or open-source
> project), but
> > > > the
> > > > > >> > > > > >deployment of
> > > > > >> > > > > >>> > that
> > > > > >> > > > > >>> >    software is "owned" by the user installing
> Airflow.
> > > > The
> > > > > >> > > > > >"stewardship"
> > > > > >> > > > > >>> > might
> > > > > >> > > > > >>> >    be also the users but the controlling party
> (Oracle
> > > > for
> > > > > >> > > > > >example)
> > > > > >> > > > > >>> might
> > > > > >> > > > > >>> > be
> > > > > >> > > > > >>> >    interested in maintaining those operators as
> well.
> > > > > >> > > > > >>> >    - *providers* - API/software/deployments are
> fully
> > > > > >> > controlled
> > > > > >> > > > > >by a
> > > > > >> > > > > >>> 3rd
> > > > > >> > > > > >>> >    party. Here most likely "provider" will be
> > > > interested in
> > > > > >> > > > > >maintaining
> > > > > >> > > > > >>> the
> > > > > >> > > > > >>> >    operators (and for example like Google -
> provide
> > > > > >> integration
> > > > > >> > > > > >>> guidelines
> > > > > >> > > > > >>> >    <
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > > >> > > > > >>> > >
> > > > > >> > > > > >>> > for
> > > > > >> > > > > >>> >    their hooks/operators/sensors)
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 3) Between-providers transfer operators should be
> kept
> > > > at
> > > > > >> the
> > > > > >> > > > > >"target"
> > > > > >> > > > > >>> > rather than "source"
> > > > > >> > > > > >>> > For example S3 -> GCS should be in "google"
> provider,
> > > > but
> > > > > >> > GCS-> S3
> > > > > >> > > > > >>> should
> > > > > >> > > > > >>> > be in "amazon".
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 4) One-side provider transfer operators should be
> kept
> > > > at
> > > > > >> the
> > > > > >> > > > > >"provider"
> > > > > >> > > > > >>> > regardless if they are target or source.
> > > > > >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in
> > > > "google"
> > > > > >> > > > > >provider.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 5) If in doubt we will discuss individual cases
> > > > separately.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > J.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > --
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > Jarek Potiuk
> > > > > >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal
> > > Software
> > > > > >> > Engineer
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > > >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >> --
> > > > > >> > > > > >>
> > > > > >> > > > > >> Jarek Potiuk
> > > > > >> > > > > >> Polidea <https://www.polidea.com/> | Principal
> Software
> > > > > >> Engineer
> > > > > >> > > > > >>
> > > > > >> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >
> > > > > >> > > > > >--
> > > > > >> > > > > >
> > > > > >> > > > > >Jarek Potiuk
> > > > > >> > > > > >Polidea <https://www.polidea.com/> | Principal
> Software
> > > > Engineer
> > > > > >> > > > > >
> > > > > >> > > > > >M: +48 660 796 129 <+48660796129>
> > > > > >> > > > > >[image: Polidea] <https://www.polidea.com/>
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > >
> > > > > >> > > > Jarek Potiuk
> > > > > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > > Engineer
> > > > > >> > > >
> > > > > >> > > > M: +48 660 796 129 <+48660796129>
> > > > > >> > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48660796129>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
I updated the spreadsheet and put Bash + Python operator into fundamentals.
Also treat Apache same way as "proprietary" providers.

I will re-start the vote then :)..

J.


On Mon, Nov 11, 2019 at 7:21 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Ok. Happy to move it back then :). No problem with that.
>
> According to rules of AIP-21 it should actually be:  "*from
> airflow.providers.kubernetes.operators.pod import KubernetesPodOperator*"
> (Case 2A. (drop _operator in module name) + Case 5B. (keep Operator in
> class name). We can have more than just a Pod operator for Kubernetes
> (KubernetesPod, KubernetesVolume, KubernetesIstio. and many more) so
> keeping KubernetesPod in class name and having separate module for pod
> operator(s?) makes sense IMHO.
>
> It's similar to *from airflow.providers.google.cloud.operators.pubsub
> import PubSubTopicCreateOperator* for example.
>
> Re - remote log storage - indeed. That should be part of AIP- 8.
>
> J,
>
> On Mon, Nov 11, 2019 at 6:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> +1 for Python and Bash being in the stock install -- they are just _so_
>> commonly used that I think it makes sense to keep them in the base install.
>> (and the virtualenv module is not an onerous dep, not caused us any
>> problems. Yet).
>>
>> Kubeneretes is also a slighlty funny one since the deps for that will be
>> in "core" anyway thanks to the Kube executor, but I think it probably makes
>> sense to have `from airflow.providers.kubernetes.operators import
>> KubernetesOperator`. Is that the pattern we are going with for the
>> "one-level" providers, or will it be `from
>> airflow.providers.kubernetes.operators.pod_operator import
>> KubernetesOperator`?
>>
>> Possibly more an AIP-8 question: with moving Azure Blob/S3/GCS to
>> separate packages we might have to look at how we enable remote log storage.
>>
>> -a
>>
>>
>> > On 11 Nov 2019, at 15:53, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> >
>> > On Mon, Nov 11, 2019 at 4:22 PM Kamil Breguła <
>> kamil.bregula@polidea.com <ma...@polidea.com>>
>> > wrote:
>> >
>> >> One more question. Are you sure you want to move Python and Bash from
>> >> core?  These are the elements that are installed in every environment
>> >> because they are required by Airflow, so moving them to a separate
>> >> installed package is pointless in my opinion.
>> >>
>> >> I have no problem with moving them to "fundamentals", but I am not
>> sure if
>> > they are really required ? I looked through the code and other than few
>> > examples and tests, they are not really "required".  Maybe that's
>> enough to
>> > keep them in fundamentals,
>> > Also Python operator has some dependencies - virtualenv - which is only
>> > required for this operator so maybe it's worth to keep it separate from
>> > "fundamentals".
>> >
>> >
>> >> On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com>
>> wrote:
>> >>>
>> >>> I am fine with this list +1
>> >>>
>> >>> On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>> >>> wrote:
>> >>>
>> >>>> I am all for it Kamil!
>> >>>>
>> >>>> Super happy to treat Apache projects in the same way as "proprietary"
>> >>>> providers :). Anyone else has some other comments ?
>> >>>>
>> >>>> J.
>> >>>>
>> >>>> On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <
>> >> kamil.bregula@polidea.com>
>> >>>> wrote:
>> >>>>
>> >>>>> I looked at this list and I'm only worried about two operators.
>> >>>>>
>> >>>>> airflow.contrib.operators.vertica_to_hive
>> >>>>> airflow.contrib.operators.s3_to_hive
>> >>>>>
>> >>>>> If we want the operators to be grouped according to destination,
>> then
>> >>>>> this operator should be in apache package. It is the members of the
>> >>>>> Apache community who will care most about this operator being of
>> high
>> >>>>> quality. Apache can be treated equally with other large cloud
>> >>>>> providers, such as GCP, AWS. I can imagine that a new Apache product
>> >>>>> will appear and it will want to promote the same way as products of
>> >>>>> cloud providers are promoted. By creating a large number of
>> >>>>> integrations that allow you to copy data to its operating range.
>> >>>>> There's another cases - building a strong Apache community. As a
>> >>>>> member of the Apache community, we should promote Apache products to
>> >>>>> ensure that the development of the community is correct, and
>> >> therefore
>> >>>>> also for integration into our products with other products.
>> >>>>>
>> >>>>> On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <
>> >> Jarek.Potiuk@polidea.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Just to select the "packages" for this update. Anyone has
>> >> objections
>> >>>> for
>> >>>>>> this structure (details including transfer operators in
>> >>>>>>
>> >>>>>> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
>> >>>>>> Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
>> >>>>>>
>> >>>>>> *Fundamentals (no change)*
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> providers
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> google
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> cloud
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> gsuite
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> marketing_platform
>> >>>>>>
>> >>>>>>
>> >>>>>> amazon
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> aws
>> >>>>>>
>> >>>>>>
>> >>>>>> microsoft
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> azure
>> >>>>>>
>> >>>>>>
>> >>>>>> apache
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> cassandra
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> druid
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> hadoop
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> hive
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> pig
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> pinot
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> spark
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> sqoop
>> >>>>>>
>> >>>>>>
>> >>>>>> mysql
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> jira
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> databricks
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> datadog
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> dingding
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> discord
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> cloudant
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> jenkins
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> opsgenie
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> qubole
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> salesforce
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> segment
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> slack
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> snowflake
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> vertica
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> zendesk
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> celery
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> docker
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> bash
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> kubernetes
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> mssql
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> mongodb
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> mysql
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> openfaas
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> oracle
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> papermill
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> postgres
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> presto
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> python
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> redis
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> samba
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> sqlite
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> imap
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ssh
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> filesystem
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> sftp
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ftp
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> http
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> grpc
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> smtp
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> jdbc
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> winrm
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <
>> >> Jarek.Potiuk@polidea.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Let me then cancel this vote and I will restart it next week.
>> >>>>>>>
>> >>>>>>> Yeah. It's a bit like re-opening the Pandora's box but now that
>> >> we
>> >>>> know
>> >>>>>>> that we can do it, and we are unblocked in moving to google
>> >> (which is
>> >>>>> now
>> >>>>>>> the biggest move in-progress),  we can spend more time on getting
>> >>>>> better
>> >>>>>>> (and more final) consensus.
>> >>>>>>> I decided to go through the list from the docs (once again Kamil
>> >> -
>> >>>>> great
>> >>>>>>> that you did it) and prepared this spreadsheet showing the
>> >>>> structure. I
>> >>>>>>> went through ALL the operators and put them in the right place
>> >> where
>> >>>>> our
>> >>>>>>> current rules place them.
>> >>>>>>>
>> >>>>>>> After this exercise, I think that makes sense:
>> >>>>>>> - put all the stuff except fundamentals in *"providers"*
>> >> (everything
>> >>>>>>> in "providers" will be potentially backportable).
>> >>>>>>> - grouping apache projects under *"apache"* - similar to
>> >>>>>>> google/amazon/microsoft (different kind of ownership but still
>> >> it is
>> >>>> an
>> >>>>>>> ownership)
>> >>>>>>> - for the rest I think what we can do is really to put the
>> >> operators
>> >>>> in
>> >>>>>>> folders per "service/company" (without sub-packages). That
>> >> includes
>> >>>>>>> sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and
>> >> sftp]
>> >>>> ??).
>> >>>>>>> there is no "ownership" there and no reason to group them. That
>> >> will
>> >>>>> put
>> >>>>>>> "operators/hooks/sensors" at different levels in the directory
>> >> tree
>> >>>>> but we
>> >>>>>>> already have that for fundamentals and I am not too worried about
>> >>>>> that. We
>> >>>>>>> do not have to have everything at the same level.
>> >>>>>>> - I put transfer operators according to the rule where "to" side
>> >> is
>> >>>>> more
>> >>>>>>> important unless the other side is a public protocol (so sftp ->
>> >> gcs
>> >>>>> and
>> >>>>>>> gcs -> sftp both go to google/gcp). I did not have any doubt
>> >> where to
>> >>>>> put
>> >>>>>>> which transfer operator, so this is a good sign:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
>> >>>>>>>
>> >>>>>>> Can you please take a look and express your opinions here so
>> >> that we
>> >>>>> can
>> >>>>>>> have final voting next week (for those who are not yet tired
>> >> with the
>> >>>>>>> discussion ;)).
>> >>>>>>>
>> >>>>>>> J.
>> >>>>>>>
>> >>>>>>> On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>>> Yes, that makes sense.
>> >>>>>>>>
>> >>>>>>>> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
>> >>>>> kamil.bregula@polidea.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> In the case of Hadoop, it is published by Apache, so it can
>> >> be in
>> >>>>> the
>> >>>>>>>>> apache directory.  This will mimic the grouping presented in
>> >> the
>> >>>>>>>>> documentation.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <
>> >> kaxilnaik@gmail.com>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> I think we should keep the vote open at least until mid next
>> >>>> week
>> >>>>> to
>> >>>>>>>> have
>> >>>>>>>>>> more thought and inputs on this one.
>> >>>>>>>>>>
>> >>>>>>>>>> In general, I am happy with the approach but
>> >> operators/hooks and
>> >>>>>>>> sensors
>> >>>>>>>>>> shouldn't be a provider. "hadoop" can be its provider and
>> >> hdfs
>> >>>>> can be
>> >>>>>>>> a
>> >>>>>>>>>> part of it.
>> >>>>>>>>>>
>> >>>>>>>>>> providers/
>> >>>>>>>>>>    google
>> >>>>>>>>>>         cloud
>> >>>>>>>>>>             operators
>> >>>>>>>>>>             hooks
>> >>>>>>>>>>             sensors
>> >>>>>>>>>>         gsuite
>> >>>>>>>>>>             operators
>> >>>>>>>>>>             ...
>> >>>>>>>>>>    amazon
>> >>>>>>>>>>         aws
>> >>>>>>>>>>             operators
>> >>>>>>>>>>             ...
>> >>>>>>>>>>    microsoft
>> >>>>>>>>>>         azure
>> >>>>>>>>>>             operators
>> >>>>>>>>>>             ...
>> >>>>>>>>>>    hadoop
>> >>>>>>>>>>        hdfs
>> >>>>>>>>>>             operators
>> >>>>>>>>>>             ...
>> >>>>>>>>>>
>> >>>>>>>>>> We can also define what is a "provider" so we know what to
>> >> add
>> >>>> in
>> >>>>> it
>> >>>>>>>> in
>> >>>>>>>>> the
>> >>>>>>>>>> future. SSH/FTP/SFTP belongs to the same family group. Do we
>> >>>> want
>> >>>>> to
>> >>>>>>>> have
>> >>>>>>>>>> separate providers for each one of them ???
>> >>>>>>>>>>
>> >>>>>>>>>> Regards,
>> >>>>>>>>>> Kaxil
>> >>>>>>>>>>
>> >>>>>>>>>> On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
>> >>>>> Jarek.Potiuk@polidea.com
>> >>>>>>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> I really like to make everything a provider. That's a
>> >> great
>> >>>>> idea !
>> >>>>>>>>> This way
>> >>>>>>>>>>> everything "backportable" will have to be in "providers"
>> >>>>> package.
>> >>>>>>>>> Really
>> >>>>>>>>>>> nice and clean separation (and less mess in "airflow").
>> >> And we
>> >>>>> will
>> >>>>>>>> not
>> >>>>>>>>>>> have to have any artificial grouping (we can still group
>> >> them
>> >>>>> at the
>> >>>>>>>>>>> documentation level).
>> >>>>>>>>>>>
>> >>>>>>>>>>> We do not need backport in name. And I think it's more of
>> >>>>> technical
>> >>>>>>>>> detail
>> >>>>>>>>>>> on naming the package which we can work out while
>> >> reviewing
>> >>>> PRs
>> >>>>> and
>> >>>>>>>> we
>> >>>>>>>>> can
>> >>>>>>>>>>> agree final naming of the released packaged on PMC level
>> >> (PMCs
>> >>>>> will
>> >>>>>>>>> have to
>> >>>>>>>>>>> vote on releasing those).
>> >>>>>>>>>>>
>> >>>>>>>>>>> The thinking is that it's intention is really to be only
>> >>>>> backported
>> >>>>>>>> to
>> >>>>>>>>> 1.10
>> >>>>>>>>>>> - we are not going (yet) to use the packages in Airflow
>> >> 2.*.
>> >>>> so
>> >>>>> I
>> >>>>>>>>> thought
>> >>>>>>>>>>> by naming them backport we can express that intent more
>> >>>> clearly.
>> >>>>>>>>>>>
>> >>>>>>>>>>> So let me clarify the structure of folders we are going to
>> >>>> have
>> >>>>> if
>> >>>>>>>> we
>> >>>>>>>>>>> follow it (i just added some examples) including the
>> >> already
>> >>>>> agreed
>> >>>>>>>>> changes
>> >>>>>>>>>>> from AIP-21:
>> >>>>>>>>>>>
>> >>>>>>>>>>> providers/
>> >>>>>>>>>>>    google
>> >>>>>>>>>>>         cloud
>> >>>>>>>>>>>             operators
>> >>>>>>>>>>>             hooks
>> >>>>>>>>>>>             sensors
>> >>>>>>>>>>>         gsuite
>> >>>>>>>>>>>             operators
>> >>>>>>>>>>>             ...
>> >>>>>>>>>>>    amazon
>> >>>>>>>>>>>         aws
>> >>>>>>>>>>>             operators
>> >>>>>>>>>>>             ...
>> >>>>>>>>>>>    microsoft
>> >>>>>>>>>>>         azure
>> >>>>>>>>>>>             operators
>> >>>>>>>>>>>             ...
>> >>>>>>>>>>>    operators
>> >>>>>>>>>>>         sqlite.py
>> >>>>>>>>>>>         oracle.py
>> >>>>>>>>>>>         docker.py
>> >>>>>>>>>>>    hooks
>> >>>>>>>>>>>         hdfs.py
>> >>>>>>>>>>>         sqlite.py
>> >>>>>>>>>>>    sensors
>> >>>>>>>>>>>         http.py
>> >>>>>>>>>>>         sql.py
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> J.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
>> >>>>> ash@apache.org>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Do we need to include `-backport,`? What was the
>> >> thinking
>> >>>>> behind
>> >>>>>>>>> that?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I think software and protocol should be merged. I would
>> >> also
>> >>>>> say
>> >>>>>>>>>>>> _everything_ is a provider, so
>> >>>>> airflow.providers.ssh.SSHOperator
>> >>>>>>>> for
>> >>>>>>>>>>>> instance is what I would prefer
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> -a
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
>> >>>>>>>>> Jarek.Potiuk@polidea.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>> One more day to go. I would love to see some opinions
>> >> on
>> >>>> this
>> >>>>>>>> AIP-21
>> >>>>>>>>>>>>> update
>> >>>>>>>>>>>>> :).
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Executive summary:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> * we will be moving a number of integrations to
>> >>>> sub-packages
>> >>>>> of
>> >>>>>>>>>>>>> airflow.
>> >>>>>>>>>>>>> * they will be backportable to 1.10.*.  There will be
>> >>>>>>>>>>>>> 'apache-airflow-[package]-backport' pypi installable
>> >> with
>> >>>>> python
>> >>>>>>>> 3
>> >>>>>>>>> that
>> >>>>>>>>>>>>> will make Airflow 2.0 operators/hooks etc. available
>> >> with
>> >>>>> 1.10*
>> >>>>>>>>>>>>> operators.
>> >>>>>>>>>>>>> * the current proposal for sub-packages is
>> >>>>>>>>>>>>> "protocols/software/providers/"
>> >>>>>>>>>>>>> (but if you think merging protocols and software makes
>> >>>> sense
>> >>>>> -
>> >>>>>>>>> please
>> >>>>>>>>>>>>> express your opinion
>> >>>>>>>>>>>>> * we are not moving "fundamental" operators/hooks etc..
>> >>>>>>>>>>>>> * Airflow 2.0 is still going to be installed as a
>> >> single
>> >>>>> package
>> >>>>>>>>> with
>> >>>>>>>>>>>>> all
>> >>>>>>>>>>>>> operators (so we are not yet implementing AIP-8)
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> J.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
>> >>>>>>>>> Jarek.Potiuk@polidea.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I think all this cases are valid but maybe I was not
>> >>>>>>>> super-clear.
>> >>>>>>>>>>>>> It's
>> >>>>>>>>>>>>>> only the transfer operators that we need to decide
>> >> where
>> >>>> to
>> >>>>>>>> put -
>> >>>>>>>>> not
>> >>>>>>>>>>>>>> hooks.
>> >>>>>>>>>>>>>> Usually the complexity of communication with
>> >> particular
>> >>>>>>>> storages
>> >>>>>>>>> is
>> >>>>>>>>>>>>> (or at
>> >>>>>>>>>>>>>> least should be) in the Hooks rather than Operators.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Operators should be just thin wrappers over the
>> >> logic in
>> >>>>> the
>> >>>>>>>>> hooks.
>> >>>>>>>>>>>>>> Hooks are going to stay where they belong - S3 Hooks
>> >> in
>> >>>>> amazon,
>> >>>>>>>>> GCS
>> >>>>>>>>>>>>> Hooks
>> >>>>>>>>>>>>>> in google.cloud, GoogleSheet Hooks in google.gsuite.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Since we actually have mono-repo - this will be no
>> >>>> problem
>> >>>>>>>> (and no
>> >>>>>>>>>>>>> cross
>> >>>>>>>>>>>>>> dependencies problem) to have S3 -> GCS operator  in
>> >>>>> google and
>> >>>>>>>>> use
>> >>>>>>>>>>>>> hooks
>> >>>>>>>>>>>>>> from both google/amazon.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I hope this alleviates your concern Daniel ?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> J.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?
>> >> These
>> >>>>> you
>> >>>>>>>> would
>> >>>>>>>>>>>>> put in
>> >>>>>>>>>>>>>>> the target, i.e. the storage?  But
>> >> GoogleSheetsToSftp
>> >>>>> would
>> >>>>>>>> be in
>> >>>>>>>>>>>>> google
>> >>>>>>>>>>>>>>> sheets operators file?  The complexity, and the
>> >> shared
>> >>>>> code,
>> >>>>>>>> are
>> >>>>>>>>> in
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> gsheet component -- not into the storage
>> >> destination.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
>> >>>>>>>>>>>>> <Ja...@polidea.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Hello Airflow Community,
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The email calls for a vote to update AIP-21
>> >> Changes in
>> >>>>>>>> import
>> >>>>>>>>>>>>> paths
>> >>>>>>>>>>>>>>>> <
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>> the changes described below. The vote will last
>> >> till
>> >>>>>>>> Saturday
>> >>>>>>>>> 8th
>> >>>>>>>>>>>>> 2am
>> >>>>>>>>>>>>>>> CEST
>> >>>>>>>>>>>>>>>> (72 hours). Committers have a binding vote but
>> >>>> everyone
>> >>>>> from
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>> community
>> >>>>>>>>>>>>>>>> is encouraged to cast an advisory vote.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Summary*:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The proposal is to update AIP-21 to move all
>> >> non-core
>> >>>>>>>>>>>>>>>> operators/hooks/sensor (and related files) to
>> >>>>> sub-packages
>> >>>>>>>>> within
>> >>>>>>>>>>>>>>> airflow
>> >>>>>>>>>>>>>>>> (protocols/software/providers) or
>> >>>> (software/providers).
>> >>>>>>>>>>>>>>>> I am also happy to merge protocols+software, so
>> >> if you
>> >>>>> have
>> >>>>>>>> a
>> >>>>>>>>>>>>> strong
>> >>>>>>>>>>>>>>>> opinion on it - please state it with your vote
>> >> and we
>> >>>>> can
>> >>>>>>>>> decide
>> >>>>>>>>>>>>> based
>> >>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>> majority.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Those packages will be separately released
>> >>>>> (schedule/process
>> >>>>>>>>> TBD)
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>> will
>> >>>>>>>>>>>>>>>> be backportable to 1.10.* airflow series, so that
>> >>>> users
>> >>>>> can
>> >>>>>>>>>>>>> install it
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> start using new Airflow2.0 operators in their
>> >> Python 3
>> >>>>>>>> Airflow
>> >>>>>>>>>>>>> 1.10
>> >>>>>>>>>>>>>>>> environments (only Python 3.5+ is supported).
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> We will proceed with migrating the providers
>> >> package
>> >>>> to
>> >>>>>>>> already
>> >>>>>>>>>>>>> agreed
>> >>>>>>>>>>>>>>>> paths without waiting for the final vote
>> >> (following
>> >>>>> current
>> >>>>>>>>>>>>> version of
>> >>>>>>>>>>>>>>>> AIP-21). Since we have working POC - we know the
>> >>>> agreed
>> >>>>>>>> paths
>> >>>>>>>>> will
>> >>>>>>>>>>>>> work
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>> us.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *Previous discussions: *
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>   -
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>> >>>>>>>>>>>>>>>>   -
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> *More Details*:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 1) Information that we are going in the direction
>> >> of
>> >>>>> AIP-8
>> >>>>>>>> but
>> >>>>>>>>> not
>> >>>>>>>>>>>>> yet
>> >>>>>>>>>>>>>>>> reaching it - focusing on separating out
>> >> backportable
>> >>>>>>>> packages
>> >>>>>>>>>>>>>>> installable
>> >>>>>>>>>>>>>>>> in Airflow releases 1.10.* . Airflow 2.0 will
>> >> still be
>> >>>>>>>>> installed
>> >>>>>>>>>>>>> as a
>> >>>>>>>>>>>>>>> whole
>> >>>>>>>>>>>>>>>> and all the source will be kept in one repo, but
>> >> we
>> >>>> now
>> >>>>>>>> have a
>> >>>>>>>>> way
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>> build
>> >>>>>>>>>>>>>>>> backportable packages for groups of operators. POC
>> >>>>> available
>> >>>>>>>>> here:
>> >>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/6507
>> >> (based on
>> >>>>> Ash's
>> >>>>>>>>>>>>>>>> https://github.com/ashb/airflow-submodule-test)
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 2) We move all integrations to new packages
>> >> (keeping
>> >>>>>>>> deprecated
>> >>>>>>>>>>>>> import
>> >>>>>>>>>>>>>>>> aliases in the old places). The following split
>> >>>>> (according
>> >>>>>>>> to
>> >>>>>>>>>>>>>>> "stewardship"
>> >>>>>>>>>>>>>>>> over the integrations):
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>   - *fundamentals* - core of ariflow - they are
>> >>>> really
>> >>>>>>>> part of
>> >>>>>>>>>>>>> Apache
>> >>>>>>>>>>>>>>>>   Airflow. Stewards - core Airflow team. Not
>> >>>>>>>>>>>>> backportable/separated
>> >>>>>>>>>>>>>>> out.
>> >>>>>>>>>>>>>>>>   - *protocols* - are not owned by anyone, they
>> >> are
>> >>>>> public
>> >>>>>>>> and
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>   implementation is fully "open". There are no
>> >>>>> particular
>> >>>>>>>>>>>>> stewards (no
>> >>>>>>>>>>>>>>>> need).
>> >>>>>>>>>>>>>>>>   Users of particular protocols should mainly
>> >>>> maintain
>> >>>>>>>> those
>> >>>>>>>>> and
>> >>>>>>>>>>>>> add
>> >>>>>>>>>>>>>>>> support
>> >>>>>>>>>>>>>>>>   for different versions of the protocols.
>> >>>>>>>>>>>>>>>>   - *software* - both API and software are
>> >> controlled
>> >>>>> by
>> >>>>>>>>> someone
>> >>>>>>>>>>>>>>> outside
>> >>>>>>>>>>>>>>>>   of Airflow (commercial or open-source
>> >> project), but
>> >>>>> the
>> >>>>>>>>>>>>> deployment of
>> >>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>   software is "owned" by the user installing
>> >> Airflow.
>> >>>>> The
>> >>>>>>>>>>>>> "stewardship"
>> >>>>>>>>>>>>>>>> might
>> >>>>>>>>>>>>>>>>   be also the users but the controlling party
>> >> (Oracle
>> >>>>> for
>> >>>>>>>>>>>>> example)
>> >>>>>>>>>>>>>>> might
>> >>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>   interested in maintaining those operators as
>> >> well.
>> >>>>>>>>>>>>>>>>   - *providers* - API/software/deployments are
>> >> fully
>> >>>>>>>>> controlled
>> >>>>>>>>>>>>> by a
>> >>>>>>>>>>>>>>> 3rd
>> >>>>>>>>>>>>>>>>   party. Here most likely "provider" will be
>> >>>>> interested in
>> >>>>>>>>>>>>> maintaining
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>   operators (and for example like Google -
>> >> provide
>> >>>>>>>> integration
>> >>>>>>>>>>>>>>> guidelines
>> >>>>>>>>>>>>>>>>   <
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>
>> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>   their hooks/operators/sensors)
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 3) Between-providers transfer operators should be
>> >> kept
>> >>>>> at
>> >>>>>>>> the
>> >>>>>>>>>>>>> "target"
>> >>>>>>>>>>>>>>>> rather than "source"
>> >>>>>>>>>>>>>>>> For example S3 -> GCS should be in "google"
>> >> provider,
>> >>>>> but
>> >>>>>>>>> GCS-> S3
>> >>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>> be in "amazon".
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 4) One-side provider transfer operators should be
>> >> kept
>> >>>>> at
>> >>>>>>>> the
>> >>>>>>>>>>>>> "provider"
>> >>>>>>>>>>>>>>>> regardless if they are target or source.
>> >>>>>>>>>>>>>>>> For example GCS-> SFTP or SFTP -> GCS should be in
>> >>>>> "google"
>> >>>>>>>>>>>>> provider.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 5) If in doubt we will discuss individual cases
>> >>>>> separately.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> J.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Jarek Potiuk
>> >>>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>> >>>> Software
>> >>>>>>>>> Engineer
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Jarek Potiuk
>> >>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>> >> Software
>> >>>>>>>> Engineer
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Jarek Potiuk
>> >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>> >> Software
>> >>>>> Engineer
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>>
>> >>>>>>>>>>> Jarek Potiuk
>> >>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>> >>>>> Engineer
>> >>>>>>>>>>>
>> >>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>>
>> >>>>>>> Jarek Potiuk
>> >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>>>
>> >>>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Jarek Potiuk
>> >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>>>
>> >>>>>> M: +48 660 796 129 <+48660796129>
>> >>>>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Jarek Potiuk
>> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >>>>
>> >>>> M: +48 660 796 129 <+48660796129>
>> >>>> [image: Polidea] <https://www.polidea.com/>
>> >>>>
>> >>
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
>> Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
Ok. Happy to move it back then :). No problem with that.

According to rules of AIP-21 it should actually be:  "*from
airflow.providers.kubernetes.operators.pod import KubernetesPodOperator*"
(Case 2A. (drop _operator in module name) + Case 5B. (keep Operator in
class name). We can have more than just a Pod operator for Kubernetes
(KubernetesPod, KubernetesVolume, KubernetesIstio. and many more) so
keeping KubernetesPod in class name and having separate module for pod
operator(s?) makes sense IMHO.

It's similar to *from airflow.providers.google.cloud.operators.pubsub
import PubSubTopicCreateOperator* for example.

Re - remote log storage - indeed. That should be part of AIP- 8.

J,

On Mon, Nov 11, 2019 at 6:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> +1 for Python and Bash being in the stock install -- they are just _so_
> commonly used that I think it makes sense to keep them in the base install.
> (and the virtualenv module is not an onerous dep, not caused us any
> problems. Yet).
>
> Kubeneretes is also a slighlty funny one since the deps for that will be
> in "core" anyway thanks to the Kube executor, but I think it probably makes
> sense to have `from airflow.providers.kubernetes.operators import
> KubernetesOperator`. Is that the pattern we are going with for the
> "one-level" providers, or will it be `from
> airflow.providers.kubernetes.operators.pod_operator import
> KubernetesOperator`?
>
> Possibly more an AIP-8 question: with moving Azure Blob/S3/GCS to separate
> packages we might have to look at how we enable remote log storage.
>
> -a
>
>
> > On 11 Nov 2019, at 15:53, Jarek Potiuk <Ja...@polidea.com> wrote:
> >
> > On Mon, Nov 11, 2019 at 4:22 PM Kamil Breguła <kamil.bregula@polidea.com
> <ma...@polidea.com>>
> > wrote:
> >
> >> One more question. Are you sure you want to move Python and Bash from
> >> core?  These are the elements that are installed in every environment
> >> because they are required by Airflow, so moving them to a separate
> >> installed package is pointless in my opinion.
> >>
> >> I have no problem with moving them to "fundamentals", but I am not sure
> if
> > they are really required ? I looked through the code and other than few
> > examples and tests, they are not really "required".  Maybe that's enough
> to
> > keep them in fundamentals,
> > Also Python operator has some dependencies - virtualenv - which is only
> > required for this operator so maybe it's worth to keep it separate from
> > "fundamentals".
> >
> >
> >> On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com> wrote:
> >>>
> >>> I am fine with this list +1
> >>>
> >>> On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> >>> wrote:
> >>>
> >>>> I am all for it Kamil!
> >>>>
> >>>> Super happy to treat Apache projects in the same way as "proprietary"
> >>>> providers :). Anyone else has some other comments ?
> >>>>
> >>>> J.
> >>>>
> >>>> On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <
> >> kamil.bregula@polidea.com>
> >>>> wrote:
> >>>>
> >>>>> I looked at this list and I'm only worried about two operators.
> >>>>>
> >>>>> airflow.contrib.operators.vertica_to_hive
> >>>>> airflow.contrib.operators.s3_to_hive
> >>>>>
> >>>>> If we want the operators to be grouped according to destination, then
> >>>>> this operator should be in apache package. It is the members of the
> >>>>> Apache community who will care most about this operator being of high
> >>>>> quality. Apache can be treated equally with other large cloud
> >>>>> providers, such as GCP, AWS. I can imagine that a new Apache product
> >>>>> will appear and it will want to promote the same way as products of
> >>>>> cloud providers are promoted. By creating a large number of
> >>>>> integrations that allow you to copy data to its operating range.
> >>>>> There's another cases - building a strong Apache community. As a
> >>>>> member of the Apache community, we should promote Apache products to
> >>>>> ensure that the development of the community is correct, and
> >> therefore
> >>>>> also for integration into our products with other products.
> >>>>>
> >>>>> On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <
> >> Jarek.Potiuk@polidea.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Just to select the "packages" for this update. Anyone has
> >> objections
> >>>> for
> >>>>>> this structure (details including transfer operators in
> >>>>>>
> >>>>>> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> >>>>>> Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> >>>>>>
> >>>>>> *Fundamentals (no change)*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> providers
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> google
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> cloud
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> gsuite
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> marketing_platform
> >>>>>>
> >>>>>>
> >>>>>> amazon
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> aws
> >>>>>>
> >>>>>>
> >>>>>> microsoft
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> azure
> >>>>>>
> >>>>>>
> >>>>>> apache
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> cassandra
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> druid
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> hadoop
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> hive
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> pig
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> pinot
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> spark
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> sqoop
> >>>>>>
> >>>>>>
> >>>>>> mysql
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> jira
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> databricks
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> datadog
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> dingding
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> discord
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> cloudant
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> jenkins
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> opsgenie
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> qubole
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> salesforce
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> segment
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> slack
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> snowflake
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> vertica
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> zendesk
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> celery
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> docker
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> bash
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> kubernetes
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> mssql
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> mongodb
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> mysql
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> openfaas
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> oracle
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> papermill
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> postgres
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> presto
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> python
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> redis
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> samba
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> sqlite
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> imap
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ssh
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> filesystem
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> sftp
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ftp
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> http
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> grpc
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> smtp
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> jdbc
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> winrm
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <
> >> Jarek.Potiuk@polidea.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Let me then cancel this vote and I will restart it next week.
> >>>>>>>
> >>>>>>> Yeah. It's a bit like re-opening the Pandora's box but now that
> >> we
> >>>> know
> >>>>>>> that we can do it, and we are unblocked in moving to google
> >> (which is
> >>>>> now
> >>>>>>> the biggest move in-progress),  we can spend more time on getting
> >>>>> better
> >>>>>>> (and more final) consensus.
> >>>>>>> I decided to go through the list from the docs (once again Kamil
> >> -
> >>>>> great
> >>>>>>> that you did it) and prepared this spreadsheet showing the
> >>>> structure. I
> >>>>>>> went through ALL the operators and put them in the right place
> >> where
> >>>>> our
> >>>>>>> current rules place them.
> >>>>>>>
> >>>>>>> After this exercise, I think that makes sense:
> >>>>>>> - put all the stuff except fundamentals in *"providers"*
> >> (everything
> >>>>>>> in "providers" will be potentially backportable).
> >>>>>>> - grouping apache projects under *"apache"* - similar to
> >>>>>>> google/amazon/microsoft (different kind of ownership but still
> >> it is
> >>>> an
> >>>>>>> ownership)
> >>>>>>> - for the rest I think what we can do is really to put the
> >> operators
> >>>> in
> >>>>>>> folders per "service/company" (without sub-packages). That
> >> includes
> >>>>>>> sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and
> >> sftp]
> >>>> ??).
> >>>>>>> there is no "ownership" there and no reason to group them. That
> >> will
> >>>>> put
> >>>>>>> "operators/hooks/sensors" at different levels in the directory
> >> tree
> >>>>> but we
> >>>>>>> already have that for fundamentals and I am not too worried about
> >>>>> that. We
> >>>>>>> do not have to have everything at the same level.
> >>>>>>> - I put transfer operators according to the rule where "to" side
> >> is
> >>>>> more
> >>>>>>> important unless the other side is a public protocol (so sftp ->
> >> gcs
> >>>>> and
> >>>>>>> gcs -> sftp both go to google/gcp). I did not have any doubt
> >> where to
> >>>>> put
> >>>>>>> which transfer operator, so this is a good sign:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> >>>>>>>
> >>>>>>> Can you please take a look and express your opinions here so
> >> that we
> >>>>> can
> >>>>>>> have final voting next week (for those who are not yet tired
> >> with the
> >>>>>>> discussion ;)).
> >>>>>>>
> >>>>>>> J.
> >>>>>>>
> >>>>>>> On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Yes, that makes sense.
> >>>>>>>>
> >>>>>>>> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> >>>>> kamil.bregula@polidea.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> In the case of Hadoop, it is published by Apache, so it can
> >> be in
> >>>>> the
> >>>>>>>>> apache directory.  This will mimic the grouping presented in
> >> the
> >>>>>>>>> documentation.
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <
> >> kaxilnaik@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I think we should keep the vote open at least until mid next
> >>>> week
> >>>>> to
> >>>>>>>> have
> >>>>>>>>>> more thought and inputs on this one.
> >>>>>>>>>>
> >>>>>>>>>> In general, I am happy with the approach but
> >> operators/hooks and
> >>>>>>>> sensors
> >>>>>>>>>> shouldn't be a provider. "hadoop" can be its provider and
> >> hdfs
> >>>>> can be
> >>>>>>>> a
> >>>>>>>>>> part of it.
> >>>>>>>>>>
> >>>>>>>>>> providers/
> >>>>>>>>>>    google
> >>>>>>>>>>         cloud
> >>>>>>>>>>             operators
> >>>>>>>>>>             hooks
> >>>>>>>>>>             sensors
> >>>>>>>>>>         gsuite
> >>>>>>>>>>             operators
> >>>>>>>>>>             ...
> >>>>>>>>>>    amazon
> >>>>>>>>>>         aws
> >>>>>>>>>>             operators
> >>>>>>>>>>             ...
> >>>>>>>>>>    microsoft
> >>>>>>>>>>         azure
> >>>>>>>>>>             operators
> >>>>>>>>>>             ...
> >>>>>>>>>>    hadoop
> >>>>>>>>>>        hdfs
> >>>>>>>>>>             operators
> >>>>>>>>>>             ...
> >>>>>>>>>>
> >>>>>>>>>> We can also define what is a "provider" so we know what to
> >> add
> >>>> in
> >>>>> it
> >>>>>>>> in
> >>>>>>>>> the
> >>>>>>>>>> future. SSH/FTP/SFTP belongs to the same family group. Do we
> >>>> want
> >>>>> to
> >>>>>>>> have
> >>>>>>>>>> separate providers for each one of them ???
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Kaxil
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> >>>>> Jarek.Potiuk@polidea.com
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I really like to make everything a provider. That's a
> >> great
> >>>>> idea !
> >>>>>>>>> This way
> >>>>>>>>>>> everything "backportable" will have to be in "providers"
> >>>>> package.
> >>>>>>>>> Really
> >>>>>>>>>>> nice and clean separation (and less mess in "airflow").
> >> And we
> >>>>> will
> >>>>>>>> not
> >>>>>>>>>>> have to have any artificial grouping (we can still group
> >> them
> >>>>> at the
> >>>>>>>>>>> documentation level).
> >>>>>>>>>>>
> >>>>>>>>>>> We do not need backport in name. And I think it's more of
> >>>>> technical
> >>>>>>>>> detail
> >>>>>>>>>>> on naming the package which we can work out while
> >> reviewing
> >>>> PRs
> >>>>> and
> >>>>>>>> we
> >>>>>>>>> can
> >>>>>>>>>>> agree final naming of the released packaged on PMC level
> >> (PMCs
> >>>>> will
> >>>>>>>>> have to
> >>>>>>>>>>> vote on releasing those).
> >>>>>>>>>>>
> >>>>>>>>>>> The thinking is that it's intention is really to be only
> >>>>> backported
> >>>>>>>> to
> >>>>>>>>> 1.10
> >>>>>>>>>>> - we are not going (yet) to use the packages in Airflow
> >> 2.*.
> >>>> so
> >>>>> I
> >>>>>>>>> thought
> >>>>>>>>>>> by naming them backport we can express that intent more
> >>>> clearly.
> >>>>>>>>>>>
> >>>>>>>>>>> So let me clarify the structure of folders we are going to
> >>>> have
> >>>>> if
> >>>>>>>> we
> >>>>>>>>>>> follow it (i just added some examples) including the
> >> already
> >>>>> agreed
> >>>>>>>>> changes
> >>>>>>>>>>> from AIP-21:
> >>>>>>>>>>>
> >>>>>>>>>>> providers/
> >>>>>>>>>>>    google
> >>>>>>>>>>>         cloud
> >>>>>>>>>>>             operators
> >>>>>>>>>>>             hooks
> >>>>>>>>>>>             sensors
> >>>>>>>>>>>         gsuite
> >>>>>>>>>>>             operators
> >>>>>>>>>>>             ...
> >>>>>>>>>>>    amazon
> >>>>>>>>>>>         aws
> >>>>>>>>>>>             operators
> >>>>>>>>>>>             ...
> >>>>>>>>>>>    microsoft
> >>>>>>>>>>>         azure
> >>>>>>>>>>>             operators
> >>>>>>>>>>>             ...
> >>>>>>>>>>>    operators
> >>>>>>>>>>>         sqlite.py
> >>>>>>>>>>>         oracle.py
> >>>>>>>>>>>         docker.py
> >>>>>>>>>>>    hooks
> >>>>>>>>>>>         hdfs.py
> >>>>>>>>>>>         sqlite.py
> >>>>>>>>>>>    sensors
> >>>>>>>>>>>         http.py
> >>>>>>>>>>>         sql.py
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> J.
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> >>>>> ash@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Do we need to include `-backport,`? What was the
> >> thinking
> >>>>> behind
> >>>>>>>>> that?
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think software and protocol should be merged. I would
> >> also
> >>>>> say
> >>>>>>>>>>>> _everything_ is a provider, so
> >>>>> airflow.providers.ssh.SSHOperator
> >>>>>>>> for
> >>>>>>>>>>>> instance is what I would prefer
> >>>>>>>>>>>>
> >>>>>>>>>>>> -a
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> >>>>>>>>> Jarek.Potiuk@polidea.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>> One more day to go. I would love to see some opinions
> >> on
> >>>> this
> >>>>>>>> AIP-21
> >>>>>>>>>>>>> update
> >>>>>>>>>>>>> :).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Executive summary:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * we will be moving a number of integrations to
> >>>> sub-packages
> >>>>> of
> >>>>>>>>>>>>> airflow.
> >>>>>>>>>>>>> * they will be backportable to 1.10.*.  There will be
> >>>>>>>>>>>>> 'apache-airflow-[package]-backport' pypi installable
> >> with
> >>>>> python
> >>>>>>>> 3
> >>>>>>>>> that
> >>>>>>>>>>>>> will make Airflow 2.0 operators/hooks etc. available
> >> with
> >>>>> 1.10*
> >>>>>>>>>>>>> operators.
> >>>>>>>>>>>>> * the current proposal for sub-packages is
> >>>>>>>>>>>>> "protocols/software/providers/"
> >>>>>>>>>>>>> (but if you think merging protocols and software makes
> >>>> sense
> >>>>> -
> >>>>>>>>> please
> >>>>>>>>>>>>> express your opinion
> >>>>>>>>>>>>> * we are not moving "fundamental" operators/hooks etc..
> >>>>>>>>>>>>> * Airflow 2.0 is still going to be installed as a
> >> single
> >>>>> package
> >>>>>>>>> with
> >>>>>>>>>>>>> all
> >>>>>>>>>>>>> operators (so we are not yet implementing AIP-8)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> J.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> >>>>>>>>> Jarek.Potiuk@polidea.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think all this cases are valid but maybe I was not
> >>>>>>>> super-clear.
> >>>>>>>>>>>>> It's
> >>>>>>>>>>>>>> only the transfer operators that we need to decide
> >> where
> >>>> to
> >>>>>>>> put -
> >>>>>>>>> not
> >>>>>>>>>>>>>> hooks.
> >>>>>>>>>>>>>> Usually the complexity of communication with
> >> particular
> >>>>>>>> storages
> >>>>>>>>> is
> >>>>>>>>>>>>> (or at
> >>>>>>>>>>>>>> least should be) in the Hooks rather than Operators.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Operators should be just thin wrappers over the
> >> logic in
> >>>>> the
> >>>>>>>>> hooks.
> >>>>>>>>>>>>>> Hooks are going to stay where they belong - S3 Hooks
> >> in
> >>>>> amazon,
> >>>>>>>>> GCS
> >>>>>>>>>>>>> Hooks
> >>>>>>>>>>>>>> in google.cloud, GoogleSheet Hooks in google.gsuite.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Since we actually have mono-repo - this will be no
> >>>> problem
> >>>>>>>> (and no
> >>>>>>>>>>>>> cross
> >>>>>>>>>>>>>> dependencies problem) to have S3 -> GCS operator  in
> >>>>> google and
> >>>>>>>>> use
> >>>>>>>>>>>>> hooks
> >>>>>>>>>>>>>> from both google/amazon.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I hope this alleviates your concern Daniel ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> J.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?
> >> These
> >>>>> you
> >>>>>>>> would
> >>>>>>>>>>>>> put in
> >>>>>>>>>>>>>>> the target, i.e. the storage?  But
> >> GoogleSheetsToSftp
> >>>>> would
> >>>>>>>> be in
> >>>>>>>>>>>>> google
> >>>>>>>>>>>>>>> sheets operators file?  The complexity, and the
> >> shared
> >>>>> code,
> >>>>>>>> are
> >>>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> gsheet component -- not into the storage
> >> destination.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> >>>>>>>>>>>>> <Ja...@polidea.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hello Airflow Community,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The email calls for a vote to update AIP-21
> >> Changes in
> >>>>>>>> import
> >>>>>>>>>>>>> paths
> >>>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> the changes described below. The vote will last
> >> till
> >>>>>>>> Saturday
> >>>>>>>>> 8th
> >>>>>>>>>>>>> 2am
> >>>>>>>>>>>>>>> CEST
> >>>>>>>>>>>>>>>> (72 hours). Committers have a binding vote but
> >>>> everyone
> >>>>> from
> >>>>>>>>> the
> >>>>>>>>>>>>>>> community
> >>>>>>>>>>>>>>>> is encouraged to cast an advisory vote.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Summary*:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The proposal is to update AIP-21 to move all
> >> non-core
> >>>>>>>>>>>>>>>> operators/hooks/sensor (and related files) to
> >>>>> sub-packages
> >>>>>>>>> within
> >>>>>>>>>>>>>>> airflow
> >>>>>>>>>>>>>>>> (protocols/software/providers) or
> >>>> (software/providers).
> >>>>>>>>>>>>>>>> I am also happy to merge protocols+software, so
> >> if you
> >>>>> have
> >>>>>>>> a
> >>>>>>>>>>>>> strong
> >>>>>>>>>>>>>>>> opinion on it - please state it with your vote
> >> and we
> >>>>> can
> >>>>>>>>> decide
> >>>>>>>>>>>>> based
> >>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> majority.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Those packages will be separately released
> >>>>> (schedule/process
> >>>>>>>>> TBD)
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> be backportable to 1.10.* airflow series, so that
> >>>> users
> >>>>> can
> >>>>>>>>>>>>> install it
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> start using new Airflow2.0 operators in their
> >> Python 3
> >>>>>>>> Airflow
> >>>>>>>>>>>>> 1.10
> >>>>>>>>>>>>>>>> environments (only Python 3.5+ is supported).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We will proceed with migrating the providers
> >> package
> >>>> to
> >>>>>>>> already
> >>>>>>>>>>>>> agreed
> >>>>>>>>>>>>>>>> paths without waiting for the final vote
> >> (following
> >>>>> current
> >>>>>>>>>>>>> version of
> >>>>>>>>>>>>>>>> AIP-21). Since we have working POC - we know the
> >>>> agreed
> >>>>>>>> paths
> >>>>>>>>> will
> >>>>>>>>>>>>> work
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> us.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *Previous discussions: *
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>   -
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> >>>>>>>>>>>>>>>>   -
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> *More Details*:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 1) Information that we are going in the direction
> >> of
> >>>>> AIP-8
> >>>>>>>> but
> >>>>>>>>> not
> >>>>>>>>>>>>> yet
> >>>>>>>>>>>>>>>> reaching it - focusing on separating out
> >> backportable
> >>>>>>>> packages
> >>>>>>>>>>>>>>> installable
> >>>>>>>>>>>>>>>> in Airflow releases 1.10.* . Airflow 2.0 will
> >> still be
> >>>>>>>>> installed
> >>>>>>>>>>>>> as a
> >>>>>>>>>>>>>>> whole
> >>>>>>>>>>>>>>>> and all the source will be kept in one repo, but
> >> we
> >>>> now
> >>>>>>>> have a
> >>>>>>>>> way
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> build
> >>>>>>>>>>>>>>>> backportable packages for groups of operators. POC
> >>>>> available
> >>>>>>>>> here:
> >>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/6507
> >> (based on
> >>>>> Ash's
> >>>>>>>>>>>>>>>> https://github.com/ashb/airflow-submodule-test)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2) We move all integrations to new packages
> >> (keeping
> >>>>>>>> deprecated
> >>>>>>>>>>>>> import
> >>>>>>>>>>>>>>>> aliases in the old places). The following split
> >>>>> (according
> >>>>>>>> to
> >>>>>>>>>>>>>>> "stewardship"
> >>>>>>>>>>>>>>>> over the integrations):
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>   - *fundamentals* - core of ariflow - they are
> >>>> really
> >>>>>>>> part of
> >>>>>>>>>>>>> Apache
> >>>>>>>>>>>>>>>>   Airflow. Stewards - core Airflow team. Not
> >>>>>>>>>>>>> backportable/separated
> >>>>>>>>>>>>>>> out.
> >>>>>>>>>>>>>>>>   - *protocols* - are not owned by anyone, they
> >> are
> >>>>> public
> >>>>>>>> and
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>   implementation is fully "open". There are no
> >>>>> particular
> >>>>>>>>>>>>> stewards (no
> >>>>>>>>>>>>>>>> need).
> >>>>>>>>>>>>>>>>   Users of particular protocols should mainly
> >>>> maintain
> >>>>>>>> those
> >>>>>>>>> and
> >>>>>>>>>>>>> add
> >>>>>>>>>>>>>>>> support
> >>>>>>>>>>>>>>>>   for different versions of the protocols.
> >>>>>>>>>>>>>>>>   - *software* - both API and software are
> >> controlled
> >>>>> by
> >>>>>>>>> someone
> >>>>>>>>>>>>>>> outside
> >>>>>>>>>>>>>>>>   of Airflow (commercial or open-source
> >> project), but
> >>>>> the
> >>>>>>>>>>>>> deployment of
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>   software is "owned" by the user installing
> >> Airflow.
> >>>>> The
> >>>>>>>>>>>>> "stewardship"
> >>>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>>   be also the users but the controlling party
> >> (Oracle
> >>>>> for
> >>>>>>>>>>>>> example)
> >>>>>>>>>>>>>>> might
> >>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>   interested in maintaining those operators as
> >> well.
> >>>>>>>>>>>>>>>>   - *providers* - API/software/deployments are
> >> fully
> >>>>>>>>> controlled
> >>>>>>>>>>>>> by a
> >>>>>>>>>>>>>>> 3rd
> >>>>>>>>>>>>>>>>   party. Here most likely "provider" will be
> >>>>> interested in
> >>>>>>>>>>>>> maintaining
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>   operators (and for example like Google -
> >> provide
> >>>>>>>> integration
> >>>>>>>>>>>>>>> guidelines
> >>>>>>>>>>>>>>>>   <
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>   their hooks/operators/sensors)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 3) Between-providers transfer operators should be
> >> kept
> >>>>> at
> >>>>>>>> the
> >>>>>>>>>>>>> "target"
> >>>>>>>>>>>>>>>> rather than "source"
> >>>>>>>>>>>>>>>> For example S3 -> GCS should be in "google"
> >> provider,
> >>>>> but
> >>>>>>>>> GCS-> S3
> >>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>> be in "amazon".
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 4) One-side provider transfer operators should be
> >> kept
> >>>>> at
> >>>>>>>> the
> >>>>>>>>>>>>> "provider"
> >>>>>>>>>>>>>>>> regardless if they are target or source.
> >>>>>>>>>>>>>>>> For example GCS-> SFTP or SFTP -> GCS should be in
> >>>>> "google"
> >>>>>>>>>>>>> provider.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 5) If in doubt we will discuss individual cases
> >>>>> separately.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> J.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Jarek Potiuk
> >>>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> >>>> Software
> >>>>>>>>> Engineer
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jarek Potiuk
> >>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> >> Software
> >>>>>>>> Engineer
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jarek Potiuk
> >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> >> Software
> >>>>> Engineer
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Jarek Potiuk
> >>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> >>>>> Engineer
> >>>>>>>>>>>
> >>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Jarek Potiuk
> >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>>>
> >>>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Jarek Potiuk
> >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>>>
> >>>>>> M: +48 660 796 129 <+48660796129>
> >>>>>> [image: Polidea] <https://www.polidea.com/>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Jarek Potiuk
> >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>
> >>>> M: +48 660 796 129 <+48660796129>
> >>>> [image: Polidea] <https://www.polidea.com/>
> >>>>
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Ash Berlin-Taylor <as...@apache.org>.
+1 for Python and Bash being in the stock install -- they are just _so_ commonly used that I think it makes sense to keep them in the base install. (and the virtualenv module is not an onerous dep, not caused us any problems. Yet).

Kubeneretes is also a slighlty funny one since the deps for that will be in "core" anyway thanks to the Kube executor, but I think it probably makes sense to have `from airflow.providers.kubernetes.operators import KubernetesOperator`. Is that the pattern we are going with for the "one-level" providers, or will it be `from airflow.providers.kubernetes.operators.pod_operator import KubernetesOperator`?

Possibly more an AIP-8 question: with moving Azure Blob/S3/GCS to separate packages we might have to look at how we enable remote log storage.

-a


> On 11 Nov 2019, at 15:53, Jarek Potiuk <Ja...@polidea.com> wrote:
> 
> On Mon, Nov 11, 2019 at 4:22 PM Kamil Breguła <kamil.bregula@polidea.com <ma...@polidea.com>>
> wrote:
> 
>> One more question. Are you sure you want to move Python and Bash from
>> core?  These are the elements that are installed in every environment
>> because they are required by Airflow, so moving them to a separate
>> installed package is pointless in my opinion.
>> 
>> I have no problem with moving them to "fundamentals", but I am not sure if
> they are really required ? I looked through the code and other than few
> examples and tests, they are not really "required".  Maybe that's enough to
> keep them in fundamentals,
> Also Python operator has some dependencies - virtualenv - which is only
> required for this operator so maybe it's worth to keep it separate from
> "fundamentals".
> 
> 
>> On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com> wrote:
>>> 
>>> I am fine with this list +1
>>> 
>>> On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>> 
>>>> I am all for it Kamil!
>>>> 
>>>> Super happy to treat Apache projects in the same way as "proprietary"
>>>> providers :). Anyone else has some other comments ?
>>>> 
>>>> J.
>>>> 
>>>> On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <
>> kamil.bregula@polidea.com>
>>>> wrote:
>>>> 
>>>>> I looked at this list and I'm only worried about two operators.
>>>>> 
>>>>> airflow.contrib.operators.vertica_to_hive
>>>>> airflow.contrib.operators.s3_to_hive
>>>>> 
>>>>> If we want the operators to be grouped according to destination, then
>>>>> this operator should be in apache package. It is the members of the
>>>>> Apache community who will care most about this operator being of high
>>>>> quality. Apache can be treated equally with other large cloud
>>>>> providers, such as GCP, AWS. I can imagine that a new Apache product
>>>>> will appear and it will want to promote the same way as products of
>>>>> cloud providers are promoted. By creating a large number of
>>>>> integrations that allow you to copy data to its operating range.
>>>>> There's another cases - building a strong Apache community. As a
>>>>> member of the Apache community, we should promote Apache products to
>>>>> ensure that the development of the community is correct, and
>> therefore
>>>>> also for integration into our products with other products.
>>>>> 
>>>>> On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>>>>> wrote:
>>>>>> 
>>>>>> Just to select the "packages" for this update. Anyone has
>> objections
>>>> for
>>>>>> this structure (details including transfer operators in
>>>>>> 
>>>>>> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
>>>>>> Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
>>>>>> 
>>>>>> *Fundamentals (no change)*
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> providers
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> google
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> cloud
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> gsuite
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> marketing_platform
>>>>>> 
>>>>>> 
>>>>>> amazon
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> aws
>>>>>> 
>>>>>> 
>>>>>> microsoft
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> azure
>>>>>> 
>>>>>> 
>>>>>> apache
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> cassandra
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> druid
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> hadoop
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> hive
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> pig
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> pinot
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> spark
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> sqoop
>>>>>> 
>>>>>> 
>>>>>> mysql
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> jira
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> databricks
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> datadog
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> dingding
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> discord
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> cloudant
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> jenkins
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> opsgenie
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> qubole
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> salesforce
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> segment
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> slack
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> snowflake
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> vertica
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> zendesk
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> celery
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> docker
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> bash
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> kubernetes
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> mssql
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> mongodb
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> mysql
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> openfaas
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> oracle
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> papermill
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> postgres
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> presto
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> python
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> redis
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> samba
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> sqlite
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> imap
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ssh
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> filesystem
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> sftp
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ftp
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> http
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> grpc
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> smtp
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> jdbc
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> winrm
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Let me then cancel this vote and I will restart it next week.
>>>>>>> 
>>>>>>> Yeah. It's a bit like re-opening the Pandora's box but now that
>> we
>>>> know
>>>>>>> that we can do it, and we are unblocked in moving to google
>> (which is
>>>>> now
>>>>>>> the biggest move in-progress),  we can spend more time on getting
>>>>> better
>>>>>>> (and more final) consensus.
>>>>>>> I decided to go through the list from the docs (once again Kamil
>> -
>>>>> great
>>>>>>> that you did it) and prepared this spreadsheet showing the
>>>> structure. I
>>>>>>> went through ALL the operators and put them in the right place
>> where
>>>>> our
>>>>>>> current rules place them.
>>>>>>> 
>>>>>>> After this exercise, I think that makes sense:
>>>>>>> - put all the stuff except fundamentals in *"providers"*
>> (everything
>>>>>>> in "providers" will be potentially backportable).
>>>>>>> - grouping apache projects under *"apache"* - similar to
>>>>>>> google/amazon/microsoft (different kind of ownership but still
>> it is
>>>> an
>>>>>>> ownership)
>>>>>>> - for the rest I think what we can do is really to put the
>> operators
>>>> in
>>>>>>> folders per "service/company" (without sub-packages). That
>> includes
>>>>>>> sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and
>> sftp]
>>>> ??).
>>>>>>> there is no "ownership" there and no reason to group them. That
>> will
>>>>> put
>>>>>>> "operators/hooks/sensors" at different levels in the directory
>> tree
>>>>> but we
>>>>>>> already have that for fundamentals and I am not too worried about
>>>>> that. We
>>>>>>> do not have to have everything at the same level.
>>>>>>> - I put transfer operators according to the rule where "to" side
>> is
>>>>> more
>>>>>>> important unless the other side is a public protocol (so sftp ->
>> gcs
>>>>> and
>>>>>>> gcs -> sftp both go to google/gcp). I did not have any doubt
>> where to
>>>>> put
>>>>>>> which transfer operator, so this is a good sign:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
>>>>>>> 
>>>>>>> Can you please take a look and express your opinions here so
>> that we
>>>>> can
>>>>>>> have final voting next week (for those who are not yet tired
>> with the
>>>>>>> discussion ;)).
>>>>>>> 
>>>>>>> J.
>>>>>>> 
>>>>>>> On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Yes, that makes sense.
>>>>>>>> 
>>>>>>>> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
>>>>> kamil.bregula@polidea.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> In the case of Hadoop, it is published by Apache, so it can
>> be in
>>>>> the
>>>>>>>>> apache directory.  This will mimic the grouping presented in
>> the
>>>>>>>>> documentation.
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
>>>>>>>>> 
>>>>>>>>> On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <
>> kaxilnaik@gmail.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I think we should keep the vote open at least until mid next
>>>> week
>>>>> to
>>>>>>>> have
>>>>>>>>>> more thought and inputs on this one.
>>>>>>>>>> 
>>>>>>>>>> In general, I am happy with the approach but
>> operators/hooks and
>>>>>>>> sensors
>>>>>>>>>> shouldn't be a provider. "hadoop" can be its provider and
>> hdfs
>>>>> can be
>>>>>>>> a
>>>>>>>>>> part of it.
>>>>>>>>>> 
>>>>>>>>>> providers/
>>>>>>>>>>    google
>>>>>>>>>>         cloud
>>>>>>>>>>             operators
>>>>>>>>>>             hooks
>>>>>>>>>>             sensors
>>>>>>>>>>         gsuite
>>>>>>>>>>             operators
>>>>>>>>>>             ...
>>>>>>>>>>    amazon
>>>>>>>>>>         aws
>>>>>>>>>>             operators
>>>>>>>>>>             ...
>>>>>>>>>>    microsoft
>>>>>>>>>>         azure
>>>>>>>>>>             operators
>>>>>>>>>>             ...
>>>>>>>>>>    hadoop
>>>>>>>>>>        hdfs
>>>>>>>>>>             operators
>>>>>>>>>>             ...
>>>>>>>>>> 
>>>>>>>>>> We can also define what is a "provider" so we know what to
>> add
>>>> in
>>>>> it
>>>>>>>> in
>>>>>>>>> the
>>>>>>>>>> future. SSH/FTP/SFTP belongs to the same family group. Do we
>>>> want
>>>>> to
>>>>>>>> have
>>>>>>>>>> separate providers for each one of them ???
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Kaxil
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
>>>>> Jarek.Potiuk@polidea.com
>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I really like to make everything a provider. That's a
>> great
>>>>> idea !
>>>>>>>>> This way
>>>>>>>>>>> everything "backportable" will have to be in "providers"
>>>>> package.
>>>>>>>>> Really
>>>>>>>>>>> nice and clean separation (and less mess in "airflow").
>> And we
>>>>> will
>>>>>>>> not
>>>>>>>>>>> have to have any artificial grouping (we can still group
>> them
>>>>> at the
>>>>>>>>>>> documentation level).
>>>>>>>>>>> 
>>>>>>>>>>> We do not need backport in name. And I think it's more of
>>>>> technical
>>>>>>>>> detail
>>>>>>>>>>> on naming the package which we can work out while
>> reviewing
>>>> PRs
>>>>> and
>>>>>>>> we
>>>>>>>>> can
>>>>>>>>>>> agree final naming of the released packaged on PMC level
>> (PMCs
>>>>> will
>>>>>>>>> have to
>>>>>>>>>>> vote on releasing those).
>>>>>>>>>>> 
>>>>>>>>>>> The thinking is that it's intention is really to be only
>>>>> backported
>>>>>>>> to
>>>>>>>>> 1.10
>>>>>>>>>>> - we are not going (yet) to use the packages in Airflow
>> 2.*.
>>>> so
>>>>> I
>>>>>>>>> thought
>>>>>>>>>>> by naming them backport we can express that intent more
>>>> clearly.
>>>>>>>>>>> 
>>>>>>>>>>> So let me clarify the structure of folders we are going to
>>>> have
>>>>> if
>>>>>>>> we
>>>>>>>>>>> follow it (i just added some examples) including the
>> already
>>>>> agreed
>>>>>>>>> changes
>>>>>>>>>>> from AIP-21:
>>>>>>>>>>> 
>>>>>>>>>>> providers/
>>>>>>>>>>>    google
>>>>>>>>>>>         cloud
>>>>>>>>>>>             operators
>>>>>>>>>>>             hooks
>>>>>>>>>>>             sensors
>>>>>>>>>>>         gsuite
>>>>>>>>>>>             operators
>>>>>>>>>>>             ...
>>>>>>>>>>>    amazon
>>>>>>>>>>>         aws
>>>>>>>>>>>             operators
>>>>>>>>>>>             ...
>>>>>>>>>>>    microsoft
>>>>>>>>>>>         azure
>>>>>>>>>>>             operators
>>>>>>>>>>>             ...
>>>>>>>>>>>    operators
>>>>>>>>>>>         sqlite.py
>>>>>>>>>>>         oracle.py
>>>>>>>>>>>         docker.py
>>>>>>>>>>>    hooks
>>>>>>>>>>>         hdfs.py
>>>>>>>>>>>         sqlite.py
>>>>>>>>>>>    sensors
>>>>>>>>>>>         http.py
>>>>>>>>>>>         sql.py
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> J.
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
>>>>> ash@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Do we need to include `-backport,`? What was the
>> thinking
>>>>> behind
>>>>>>>>> that?
>>>>>>>>>>>> 
>>>>>>>>>>>> I think software and protocol should be merged. I would
>> also
>>>>> say
>>>>>>>>>>>> _everything_ is a provider, so
>>>>> airflow.providers.ssh.SSHOperator
>>>>>>>> for
>>>>>>>>>>>> instance is what I would prefer
>>>>>>>>>>>> 
>>>>>>>>>>>> -a
>>>>>>>>>>>> 
>>>>>>>>>>>> On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
>>>>>>>>> Jarek.Potiuk@polidea.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> One more day to go. I would love to see some opinions
>> on
>>>> this
>>>>>>>> AIP-21
>>>>>>>>>>>>> update
>>>>>>>>>>>>> :).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Executive summary:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> * we will be moving a number of integrations to
>>>> sub-packages
>>>>> of
>>>>>>>>>>>>> airflow.
>>>>>>>>>>>>> * they will be backportable to 1.10.*.  There will be
>>>>>>>>>>>>> 'apache-airflow-[package]-backport' pypi installable
>> with
>>>>> python
>>>>>>>> 3
>>>>>>>>> that
>>>>>>>>>>>>> will make Airflow 2.0 operators/hooks etc. available
>> with
>>>>> 1.10*
>>>>>>>>>>>>> operators.
>>>>>>>>>>>>> * the current proposal for sub-packages is
>>>>>>>>>>>>> "protocols/software/providers/"
>>>>>>>>>>>>> (but if you think merging protocols and software makes
>>>> sense
>>>>> -
>>>>>>>>> please
>>>>>>>>>>>>> express your opinion
>>>>>>>>>>>>> * we are not moving "fundamental" operators/hooks etc..
>>>>>>>>>>>>> * Airflow 2.0 is still going to be installed as a
>> single
>>>>> package
>>>>>>>>> with
>>>>>>>>>>>>> all
>>>>>>>>>>>>> operators (so we are not yet implementing AIP-8)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> J.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
>>>>>>>>> Jarek.Potiuk@polidea.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think all this cases are valid but maybe I was not
>>>>>>>> super-clear.
>>>>>>>>>>>>> It's
>>>>>>>>>>>>>> only the transfer operators that we need to decide
>> where
>>>> to
>>>>>>>> put -
>>>>>>>>> not
>>>>>>>>>>>>>> hooks.
>>>>>>>>>>>>>> Usually the complexity of communication with
>> particular
>>>>>>>> storages
>>>>>>>>> is
>>>>>>>>>>>>> (or at
>>>>>>>>>>>>>> least should be) in the Hooks rather than Operators.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Operators should be just thin wrappers over the
>> logic in
>>>>> the
>>>>>>>>> hooks.
>>>>>>>>>>>>>> Hooks are going to stay where they belong - S3 Hooks
>> in
>>>>> amazon,
>>>>>>>>> GCS
>>>>>>>>>>>>> Hooks
>>>>>>>>>>>>>> in google.cloud, GoogleSheet Hooks in google.gsuite.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Since we actually have mono-repo - this will be no
>>>> problem
>>>>>>>> (and no
>>>>>>>>>>>>> cross
>>>>>>>>>>>>>> dependencies problem) to have S3 -> GCS operator  in
>>>>> google and
>>>>>>>>> use
>>>>>>>>>>>>> hooks
>>>>>>>>>>>>>> from both google/amazon.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I hope this alleviates your concern Daniel ?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> J.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?
>> These
>>>>> you
>>>>>>>> would
>>>>>>>>>>>>> put in
>>>>>>>>>>>>>>> the target, i.e. the storage?  But
>> GoogleSheetsToSftp
>>>>> would
>>>>>>>> be in
>>>>>>>>>>>>> google
>>>>>>>>>>>>>>> sheets operators file?  The complexity, and the
>> shared
>>>>> code,
>>>>>>>> are
>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> gsheet component -- not into the storage
>> destination.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
>>>>>>>>>>>>> <Ja...@polidea.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hello Airflow Community,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The email calls for a vote to update AIP-21
>> Changes in
>>>>>>>> import
>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> the changes described below. The vote will last
>> till
>>>>>>>> Saturday
>>>>>>>>> 8th
>>>>>>>>>>>>> 2am
>>>>>>>>>>>>>>> CEST
>>>>>>>>>>>>>>>> (72 hours). Committers have a binding vote but
>>>> everyone
>>>>> from
>>>>>>>>> the
>>>>>>>>>>>>>>> community
>>>>>>>>>>>>>>>> is encouraged to cast an advisory vote.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> *Summary*:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The proposal is to update AIP-21 to move all
>> non-core
>>>>>>>>>>>>>>>> operators/hooks/sensor (and related files) to
>>>>> sub-packages
>>>>>>>>> within
>>>>>>>>>>>>>>> airflow
>>>>>>>>>>>>>>>> (protocols/software/providers) or
>>>> (software/providers).
>>>>>>>>>>>>>>>> I am also happy to merge protocols+software, so
>> if you
>>>>> have
>>>>>>>> a
>>>>>>>>>>>>> strong
>>>>>>>>>>>>>>>> opinion on it - please state it with your vote
>> and we
>>>>> can
>>>>>>>>> decide
>>>>>>>>>>>>> based
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> majority.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Those packages will be separately released
>>>>> (schedule/process
>>>>>>>>> TBD)
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> be backportable to 1.10.* airflow series, so that
>>>> users
>>>>> can
>>>>>>>>>>>>> install it
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> start using new Airflow2.0 operators in their
>> Python 3
>>>>>>>> Airflow
>>>>>>>>>>>>> 1.10
>>>>>>>>>>>>>>>> environments (only Python 3.5+ is supported).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We will proceed with migrating the providers
>> package
>>>> to
>>>>>>>> already
>>>>>>>>>>>>> agreed
>>>>>>>>>>>>>>>> paths without waiting for the final vote
>> (following
>>>>> current
>>>>>>>>>>>>> version of
>>>>>>>>>>>>>>>> AIP-21). Since we have working POC - we know the
>>>> agreed
>>>>>>>> paths
>>>>>>>>> will
>>>>>>>>>>>>> work
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> us.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> *Previous discussions: *
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>   -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>>>>>>>>>>>>>>>>   -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> *More Details*:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1) Information that we are going in the direction
>> of
>>>>> AIP-8
>>>>>>>> but
>>>>>>>>> not
>>>>>>>>>>>>> yet
>>>>>>>>>>>>>>>> reaching it - focusing on separating out
>> backportable
>>>>>>>> packages
>>>>>>>>>>>>>>> installable
>>>>>>>>>>>>>>>> in Airflow releases 1.10.* . Airflow 2.0 will
>> still be
>>>>>>>>> installed
>>>>>>>>>>>>> as a
>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>> and all the source will be kept in one repo, but
>> we
>>>> now
>>>>>>>> have a
>>>>>>>>> way
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> build
>>>>>>>>>>>>>>>> backportable packages for groups of operators. POC
>>>>> available
>>>>>>>>> here:
>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/6507
>> (based on
>>>>> Ash's
>>>>>>>>>>>>>>>> https://github.com/ashb/airflow-submodule-test)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2) We move all integrations to new packages
>> (keeping
>>>>>>>> deprecated
>>>>>>>>>>>>> import
>>>>>>>>>>>>>>>> aliases in the old places). The following split
>>>>> (according
>>>>>>>> to
>>>>>>>>>>>>>>> "stewardship"
>>>>>>>>>>>>>>>> over the integrations):
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>   - *fundamentals* - core of ariflow - they are
>>>> really
>>>>>>>> part of
>>>>>>>>>>>>> Apache
>>>>>>>>>>>>>>>>   Airflow. Stewards - core Airflow team. Not
>>>>>>>>>>>>> backportable/separated
>>>>>>>>>>>>>>> out.
>>>>>>>>>>>>>>>>   - *protocols* - are not owned by anyone, they
>> are
>>>>> public
>>>>>>>> and
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>   implementation is fully "open". There are no
>>>>> particular
>>>>>>>>>>>>> stewards (no
>>>>>>>>>>>>>>>> need).
>>>>>>>>>>>>>>>>   Users of particular protocols should mainly
>>>> maintain
>>>>>>>> those
>>>>>>>>> and
>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>   for different versions of the protocols.
>>>>>>>>>>>>>>>>   - *software* - both API and software are
>> controlled
>>>>> by
>>>>>>>>> someone
>>>>>>>>>>>>>>> outside
>>>>>>>>>>>>>>>>   of Airflow (commercial or open-source
>> project), but
>>>>> the
>>>>>>>>>>>>> deployment of
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>   software is "owned" by the user installing
>> Airflow.
>>>>> The
>>>>>>>>>>>>> "stewardship"
>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>   be also the users but the controlling party
>> (Oracle
>>>>> for
>>>>>>>>>>>>> example)
>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>   interested in maintaining those operators as
>> well.
>>>>>>>>>>>>>>>>   - *providers* - API/software/deployments are
>> fully
>>>>>>>>> controlled
>>>>>>>>>>>>> by a
>>>>>>>>>>>>>>> 3rd
>>>>>>>>>>>>>>>>   party. Here most likely "provider" will be
>>>>> interested in
>>>>>>>>>>>>> maintaining
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>   operators (and for example like Google -
>> provide
>>>>>>>> integration
>>>>>>>>>>>>>>> guidelines
>>>>>>>>>>>>>>>>   <
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>   their hooks/operators/sensors)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 3) Between-providers transfer operators should be
>> kept
>>>>> at
>>>>>>>> the
>>>>>>>>>>>>> "target"
>>>>>>>>>>>>>>>> rather than "source"
>>>>>>>>>>>>>>>> For example S3 -> GCS should be in "google"
>> provider,
>>>>> but
>>>>>>>>> GCS-> S3
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> be in "amazon".
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 4) One-side provider transfer operators should be
>> kept
>>>>> at
>>>>>>>> the
>>>>>>>>>>>>> "provider"
>>>>>>>>>>>>>>>> regardless if they are target or source.
>>>>>>>>>>>>>>>> For example GCS-> SFTP or SFTP -> GCS should be in
>>>>> "google"
>>>>>>>>>>>>> provider.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 5) If in doubt we will discuss individual cases
>>>>> separately.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> J.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Jarek Potiuk
>>>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>>>> Software
>>>>>>>>> Engineer
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jarek Potiuk
>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>> Software
>>>>>>>> Engineer
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jarek Potiuk
>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
>> Software
>>>>> Engineer
>>>>>>>>>>>>> 
>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Jarek Potiuk
>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>>>>> Engineer
>>>>>>>>>>> 
>>>>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Jarek Potiuk
>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>> 
>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>> 
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> 
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>> 
>> 
> 
> 
> -- 
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>


Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
On Mon, Nov 11, 2019 at 4:22 PM Kamil Breguła <ka...@polidea.com>
wrote:

>  One more question. Are you sure you want to move Python and Bash from
> core?  These are the elements that are installed in every environment
> because they are required by Airflow, so moving them to a separate
> installed package is pointless in my opinion.
>
> I have no problem with moving them to "fundamentals", but I am not sure if
they are really required ? I looked through the code and other than few
examples and tests, they are not really "required".  Maybe that's enough to
keep them in fundamentals,
Also Python operator has some dependencies - virtualenv - which is only
required for this operator so maybe it's worth to keep it separate from
"fundamentals".


> On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > I am fine with this list +1
> >
> > On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > I am all for it Kamil!
> > >
> > > Super happy to treat Apache projects in the same way as "proprietary"
> > > providers :). Anyone else has some other comments ?
> > >
> > > J.
> > >
> > > On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <
> kamil.bregula@polidea.com>
> > > wrote:
> > >
> > > > I looked at this list and I'm only worried about two operators.
> > > >
> > > > airflow.contrib.operators.vertica_to_hive
> > > > airflow.contrib.operators.s3_to_hive
> > > >
> > > > If we want the operators to be grouped according to destination, then
> > > > this operator should be in apache package. It is the members of the
> > > > Apache community who will care most about this operator being of high
> > > > quality. Apache can be treated equally with other large cloud
> > > > providers, such as GCP, AWS. I can imagine that a new Apache product
> > > > will appear and it will want to promote the same way as products of
> > > > cloud providers are promoted. By creating a large number of
> > > > integrations that allow you to copy data to its operating range.
> > > > There's another cases - building a strong Apache community. As a
> > > > member of the Apache community, we should promote Apache products to
> > > > ensure that the development of the community is correct, and
> therefore
> > > > also for integration into our products with other products.
> > > >
> > > > On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > > >
> > > > > Just to select the "packages" for this update. Anyone has
> objections
> > > for
> > > > > this structure (details including transfer operators in
> > > > >
> > > > > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> > > > > Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> > > > >
> > > > > *Fundamentals (no change)*
> > > > >
> > > > >
> > > > >
> > > > > providers
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > google
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > cloud
> > > > >
> > > > >
> > > > >
> > > > > gsuite
> > > > >
> > > > >
> > > > >
> > > > > marketing_platform
> > > > >
> > > > >
> > > > > amazon
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > aws
> > > > >
> > > > >
> > > > > microsoft
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > azure
> > > > >
> > > > >
> > > > > apache
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > cassandra
> > > > >
> > > > >
> > > > >
> > > > > druid
> > > > >
> > > > >
> > > > >
> > > > > hadoop
> > > > >
> > > > >
> > > > >
> > > > > hive
> > > > >
> > > > >
> > > > >
> > > > > pig
> > > > >
> > > > >
> > > > >
> > > > > pinot
> > > > >
> > > > >
> > > > >
> > > > > spark
> > > > >
> > > > >
> > > > >
> > > > > sqoop
> > > > >
> > > > >
> > > > > mysql
> > > > >
> > > > >
> > > > >
> > > > > jira
> > > > >
> > > > >
> > > > >
> > > > > databricks
> > > > >
> > > > >
> > > > >
> > > > > datadog
> > > > >
> > > > >
> > > > >
> > > > > dingding
> > > > >
> > > > >
> > > > >
> > > > > discord
> > > > >
> > > > >
> > > > >
> > > > > cloudant
> > > > >
> > > > >
> > > > >
> > > > > jenkins
> > > > >
> > > > >
> > > > >
> > > > > opsgenie
> > > > >
> > > > >
> > > > >
> > > > > qubole
> > > > >
> > > > >
> > > > >
> > > > > salesforce
> > > > >
> > > > >
> > > > >
> > > > > segment
> > > > >
> > > > >
> > > > >
> > > > > slack
> > > > >
> > > > >
> > > > >
> > > > > snowflake
> > > > >
> > > > >
> > > > >
> > > > > vertica
> > > > >
> > > > >
> > > > >
> > > > > zendesk
> > > > >
> > > > >
> > > > >
> > > > > celery
> > > > >
> > > > >
> > > > >
> > > > > docker
> > > > >
> > > > >
> > > > >
> > > > > bash
> > > > >
> > > > >
> > > > >
> > > > > kubernetes
> > > > >
> > > > >
> > > > >
> > > > > mssql
> > > > >
> > > > >
> > > > >
> > > > > mongodb
> > > > >
> > > > >
> > > > >
> > > > > mysql
> > > > >
> > > > >
> > > > >
> > > > > openfaas
> > > > >
> > > > >
> > > > >
> > > > > oracle
> > > > >
> > > > >
> > > > >
> > > > > papermill
> > > > >
> > > > >
> > > > >
> > > > > postgres
> > > > >
> > > > >
> > > > >
> > > > > presto
> > > > >
> > > > >
> > > > >
> > > > > python
> > > > >
> > > > >
> > > > >
> > > > > redis
> > > > >
> > > > >
> > > > >
> > > > > samba
> > > > >
> > > > >
> > > > >
> > > > > sqlite
> > > > >
> > > > >
> > > > >
> > > > > imap
> > > > >
> > > > >
> > > > >
> > > > > ssh
> > > > >
> > > > >
> > > > >
> > > > > filesystem
> > > > >
> > > > >
> > > > >
> > > > > sftp
> > > > >
> > > > >
> > > > >
> > > > > ftp
> > > > >
> > > > >
> > > > >
> > > > > http
> > > > >
> > > > >
> > > > >
> > > > > grpc
> > > > >
> > > > >
> > > > >
> > > > > smtp
> > > > >
> > > > >
> > > > >
> > > > > jdbc
> > > > >
> > > > >
> > > > >
> > > > > winrm
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > > wrote:
> > > > >
> > > > > > Let me then cancel this vote and I will restart it next week.
> > > > > >
> > > > > > Yeah. It's a bit like re-opening the Pandora's box but now that
> we
> > > know
> > > > > > that we can do it, and we are unblocked in moving to google
> (which is
> > > > now
> > > > > > the biggest move in-progress),  we can spend more time on getting
> > > > better
> > > > > > (and more final) consensus.
> > > > > > I decided to go through the list from the docs (once again Kamil
> -
> > > > great
> > > > > > that you did it) and prepared this spreadsheet showing the
> > > structure. I
> > > > > > went through ALL the operators and put them in the right place
> where
> > > > our
> > > > > > current rules place them.
> > > > > >
> > > > > > After this exercise, I think that makes sense:
> > > > > > - put all the stuff except fundamentals in *"providers"*
> (everything
> > > > > > in "providers" will be potentially backportable).
> > > > > > - grouping apache projects under *"apache"* - similar to
> > > > > > google/amazon/microsoft (different kind of ownership but still
> it is
> > > an
> > > > > > ownership)
> > > > > > - for the rest I think what we can do is really to put the
> operators
> > > in
> > > > > > folders per "service/company" (without sub-packages). That
> includes
> > > > > > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and
> sftp]
> > > ??).
> > > > > > there is no "ownership" there and no reason to group them. That
> will
> > > > put
> > > > > > "operators/hooks/sensors" at different levels in the directory
> tree
> > > > but we
> > > > > > already have that for fundamentals and I am not too worried about
> > > > that. We
> > > > > > do not have to have everything at the same level.
> > > > > > - I put transfer operators according to the rule where "to" side
> is
> > > > more
> > > > > > important unless the other side is a public protocol (so sftp ->
> gcs
> > > > and
> > > > > > gcs -> sftp both go to google/gcp). I did not have any doubt
> where to
> > > > put
> > > > > > which transfer operator, so this is a good sign:
> > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> > > > > >
> > > > > > Can you please take a look and express your opinions here so
> that we
> > > > can
> > > > > > have final voting next week (for those who are not yet tired
> with the
> > > > > > discussion ;)).
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Yes, that makes sense.
> > > > > >>
> > > > > >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> > > > kamil.bregula@polidea.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > In the case of Hadoop, it is published by Apache, so it can
> be in
> > > > the
> > > > > >> > apache directory.  This will mimic the grouping presented in
> the
> > > > > >> > documentation.
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> > > > > >> >
> > > > > >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > wrote:
> > > > > >> > >
> > > > > >> > > I think we should keep the vote open at least until mid next
> > > week
> > > > to
> > > > > >> have
> > > > > >> > > more thought and inputs on this one.
> > > > > >> > >
> > > > > >> > > In general, I am happy with the approach but
> operators/hooks and
> > > > > >> sensors
> > > > > >> > > shouldn't be a provider. "hadoop" can be its provider and
> hdfs
> > > > can be
> > > > > >> a
> > > > > >> > > part of it.
> > > > > >> > >
> > > > > >> > > providers/
> > > > > >> > >     google
> > > > > >> > >          cloud
> > > > > >> > >              operators
> > > > > >> > >              hooks
> > > > > >> > >              sensors
> > > > > >> > >          gsuite
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >     amazon
> > > > > >> > >          aws
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >     microsoft
> > > > > >> > >          azure
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >     hadoop
> > > > > >> > >         hdfs
> > > > > >> > >              operators
> > > > > >> > >              ...
> > > > > >> > >
> > > > > >> > > We can also define what is a "provider" so we know what to
> add
> > > in
> > > > it
> > > > > >> in
> > > > > >> > the
> > > > > >> > > future. SSH/FTP/SFTP belongs to the same family group. Do we
> > > want
> > > > to
> > > > > >> have
> > > > > >> > > separate providers for each one of them ???
> > > > > >> > >
> > > > > >> > > Regards,
> > > > > >> > > Kaxil
> > > > > >> > >
> > > > > >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > I really like to make everything a provider. That's a
> great
> > > > idea !
> > > > > >> > This way
> > > > > >> > > > everything "backportable" will have to be in "providers"
> > > > package.
> > > > > >> > Really
> > > > > >> > > > nice and clean separation (and less mess in "airflow").
> And we
> > > > will
> > > > > >> not
> > > > > >> > > > have to have any artificial grouping (we can still group
> them
> > > > at the
> > > > > >> > > > documentation level).
> > > > > >> > > >
> > > > > >> > > > We do not need backport in name. And I think it's more of
> > > > technical
> > > > > >> > detail
> > > > > >> > > > on naming the package which we can work out while
> reviewing
> > > PRs
> > > > and
> > > > > >> we
> > > > > >> > can
> > > > > >> > > > agree final naming of the released packaged on PMC level
> (PMCs
> > > > will
> > > > > >> > have to
> > > > > >> > > > vote on releasing those).
> > > > > >> > > >
> > > > > >> > > > The thinking is that it's intention is really to be only
> > > > backported
> > > > > >> to
> > > > > >> > 1.10
> > > > > >> > > > - we are not going (yet) to use the packages in Airflow
> 2.*.
> > > so
> > > > I
> > > > > >> > thought
> > > > > >> > > > by naming them backport we can express that intent more
> > > clearly.
> > > > > >> > > >
> > > > > >> > > > So let me clarify the structure of folders we are going to
> > > have
> > > > if
> > > > > >> we
> > > > > >> > > > follow it (i just added some examples) including the
> already
> > > > agreed
> > > > > >> > changes
> > > > > >> > > > from AIP-21:
> > > > > >> > > >
> > > > > >> > > > providers/
> > > > > >> > > >     google
> > > > > >> > > >          cloud
> > > > > >> > > >              operators
> > > > > >> > > >              hooks
> > > > > >> > > >              sensors
> > > > > >> > > >          gsuite
> > > > > >> > > >              operators
> > > > > >> > > >              ...
> > > > > >> > > >     amazon
> > > > > >> > > >          aws
> > > > > >> > > >              operators
> > > > > >> > > >              ...
> > > > > >> > > >     microsoft
> > > > > >> > > >          azure
> > > > > >> > > >              operators
> > > > > >> > > >              ...
> > > > > >> > > >     operators
> > > > > >> > > >          sqlite.py
> > > > > >> > > >          oracle.py
> > > > > >> > > >          docker.py
> > > > > >> > > >     hooks
> > > > > >> > > >          hdfs.py
> > > > > >> > > >          sqlite.py
> > > > > >> > > >     sensors
> > > > > >> > > >          http.py
> > > > > >> > > >          sql.py
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > J.
> > > > > >> > > >
> > > > > >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> > > > ash@apache.org>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Do we need to include `-backport,`? What was the
> thinking
> > > > behind
> > > > > >> > that?
> > > > > >> > > > >
> > > > > >> > > > > I think software and protocol should be merged. I would
> also
> > > > say
> > > > > >> > > > > _everything_ is a provider, so
> > > > airflow.providers.ssh.SSHOperator
> > > > > >> for
> > > > > >> > > > > instance is what I would prefer
> > > > > >> > > > >
> > > > > >> > > > > -a
> > > > > >> > > > >
> > > > > >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > > > > >> > Jarek.Potiuk@polidea.com>
> > > > > >> > > > > wrote:
> > > > > >> > > > > >One more day to go. I would love to see some opinions
> on
> > > this
> > > > > >> AIP-21
> > > > > >> > > > > >update
> > > > > >> > > > > >:).
> > > > > >> > > > > >
> > > > > >> > > > > >Executive summary:
> > > > > >> > > > > >
> > > > > >> > > > > >* we will be moving a number of integrations to
> > > sub-packages
> > > > of
> > > > > >> > > > > >airflow.
> > > > > >> > > > > >* they will be backportable to 1.10.*.  There will be
> > > > > >> > > > > >'apache-airflow-[package]-backport' pypi installable
> with
> > > > python
> > > > > >> 3
> > > > > >> > that
> > > > > >> > > > > >will make Airflow 2.0 operators/hooks etc. available
> with
> > > > 1.10*
> > > > > >> > > > > >operators.
> > > > > >> > > > > >* the current proposal for sub-packages is
> > > > > >> > > > > >"protocols/software/providers/"
> > > > > >> > > > > >(but if you think merging protocols and software makes
> > > sense
> > > > -
> > > > > >> > please
> > > > > >> > > > > >express your opinion
> > > > > >> > > > > >* we are not moving "fundamental" operators/hooks etc..
> > > > > >> > > > > >* Airflow 2.0 is still going to be installed as a
> single
> > > > package
> > > > > >> > with
> > > > > >> > > > > >all
> > > > > >> > > > > >operators (so we are not yet implementing AIP-8)
> > > > > >> > > > > >
> > > > > >> > > > > >J.
> > > > > >> > > > > >
> > > > > >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > > > > >> > Jarek.Potiuk@polidea.com>
> > > > > >> > > > > >wrote:
> > > > > >> > > > > >
> > > > > >> > > > > >> I think all this cases are valid but maybe I was not
> > > > > >> super-clear.
> > > > > >> > > > > >It's
> > > > > >> > > > > >> only the transfer operators that we need to decide
> where
> > > to
> > > > > >> put -
> > > > > >> > not
> > > > > >> > > > > >> hooks.
> > > > > >> > > > > >> Usually the complexity of communication with
> particular
> > > > > >> storages
> > > > > >> > is
> > > > > >> > > > > >(or at
> > > > > >> > > > > >> least should be) in the Hooks rather than Operators.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Operators should be just thin wrappers over the
> logic in
> > > > the
> > > > > >> > hooks.
> > > > > >> > > > > >> Hooks are going to stay where they belong - S3 Hooks
> in
> > > > amazon,
> > > > > >> > GCS
> > > > > >> > > > > >Hooks
> > > > > >> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > > > >> > > > > >>
> > > > > >> > > > > >> Since we actually have mono-repo - this will be no
> > > problem
> > > > > >> (and no
> > > > > >> > > > > >cross
> > > > > >> > > > > >> dependencies problem) to have S3 -> GCS operator  in
> > > > google and
> > > > > >> > use
> > > > > >> > > > > >hooks
> > > > > >> > > > > >> from both google/amazon.
> > > > > >> > > > > >>
> > > > > >> > > > > >> I hope this alleviates your concern Daniel ?
> > > > > >> > > > > >>
> > > > > >> > > > > >> J.
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?
> These
> > > > you
> > > > > >> would
> > > > > >> > > > > >put in
> > > > > >> > > > > >>> the target, i.e. the storage?  But
> GoogleSheetsToSftp
> > > > would
> > > > > >> be in
> > > > > >> > > > > >google
> > > > > >> > > > > >>> sheets operators file?  The complexity, and the
> shared
> > > > code,
> > > > > >> are
> > > > > >> > in
> > > > > >> > > > > >the
> > > > > >> > > > > >>> gsheet component -- not into the storage
> destination.
> > > > > >> > > > > >>>
> > > > > >> > > > > >>>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > > >> > > > > ><Ja...@polidea.com>
> > > > > >> > > > > >>> wrote:
> > > > > >> > > > > >>>
> > > > > >> > > > > >>> > Hello Airflow Community,
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > The email calls for a vote to update AIP-21
> Changes in
> > > > > >> import
> > > > > >> > > > > >paths
> > > > > >> > > > > >>> > <
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > > >> > > > > >>> > >
> > > > > >> > > > > >>> > with
> > > > > >> > > > > >>> > the changes described below. The vote will last
> till
> > > > > >> Saturday
> > > > > >> > 8th
> > > > > >> > > > > >2am
> > > > > >> > > > > >>> CEST
> > > > > >> > > > > >>> > (72 hours). Committers have a binding vote but
> > > everyone
> > > > from
> > > > > >> > the
> > > > > >> > > > > >>> community
> > > > > >> > > > > >>> > is encouraged to cast an advisory vote.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > *Summary*:
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > The proposal is to update AIP-21 to move all
> non-core
> > > > > >> > > > > >>> > operators/hooks/sensor (and related files) to
> > > > sub-packages
> > > > > >> > within
> > > > > >> > > > > >>> airflow
> > > > > >> > > > > >>> > (protocols/software/providers) or
> > > (software/providers).
> > > > > >> > > > > >>> > I am also happy to merge protocols+software, so
> if you
> > > > have
> > > > > >> a
> > > > > >> > > > > >strong
> > > > > >> > > > > >>> > opinion on it - please state it with your vote
> and we
> > > > can
> > > > > >> > decide
> > > > > >> > > > > >based
> > > > > >> > > > > >>> on
> > > > > >> > > > > >>> > majority.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > Those packages will be separately released
> > > > (schedule/process
> > > > > >> > TBD)
> > > > > >> > > > > >and
> > > > > >> > > > > >>> will
> > > > > >> > > > > >>> > be backportable to 1.10.* airflow series, so that
> > > users
> > > > can
> > > > > >> > > > > >install it
> > > > > >> > > > > >>> and
> > > > > >> > > > > >>> > start using new Airflow2.0 operators in their
> Python 3
> > > > > >> Airflow
> > > > > >> > > > > >1.10
> > > > > >> > > > > >>> > environments (only Python 3.5+ is supported).
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > We will proceed with migrating the providers
> package
> > > to
> > > > > >> already
> > > > > >> > > > > >agreed
> > > > > >> > > > > >>> > paths without waiting for the final vote
> (following
> > > > current
> > > > > >> > > > > >version of
> > > > > >> > > > > >>> > AIP-21). Since we have working POC - we know the
> > > agreed
> > > > > >> paths
> > > > > >> > will
> > > > > >> > > > > >work
> > > > > >> > > > > >>> for
> > > > > >> > > > > >>> > us.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > *Previous discussions: *
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >    -
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > > >> > > > > >>> >    -
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > *More Details*:
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 1) Information that we are going in the direction
> of
> > > > AIP-8
> > > > > >> but
> > > > > >> > not
> > > > > >> > > > > >yet
> > > > > >> > > > > >>> > reaching it - focusing on separating out
> backportable
> > > > > >> packages
> > > > > >> > > > > >>> installable
> > > > > >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will
> still be
> > > > > >> > installed
> > > > > >> > > > > >as a
> > > > > >> > > > > >>> whole
> > > > > >> > > > > >>> > and all the source will be kept in one repo, but
> we
> > > now
> > > > > >> have a
> > > > > >> > way
> > > > > >> > > > > >to
> > > > > >> > > > > >>> build
> > > > > >> > > > > >>> > backportable packages for groups of operators. POC
> > > > available
> > > > > >> > here:
> > > > > >> > > > > >>> > https://github.com/apache/airflow/pull/6507
> (based on
> > > > Ash's
> > > > > >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 2) We move all integrations to new packages
> (keeping
> > > > > >> deprecated
> > > > > >> > > > > >import
> > > > > >> > > > > >>> > aliases in the old places). The following split
> > > > (according
> > > > > >> to
> > > > > >> > > > > >>> "stewardship"
> > > > > >> > > > > >>> > over the integrations):
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >    - *fundamentals* - core of ariflow - they are
> > > really
> > > > > >> part of
> > > > > >> > > > > >Apache
> > > > > >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > > >> > > > > >backportable/separated
> > > > > >> > > > > >>> out.
> > > > > >> > > > > >>> >    - *protocols* - are not owned by anyone, they
> are
> > > > public
> > > > > >> and
> > > > > >> > > > > >the
> > > > > >> > > > > >>> >    implementation is fully "open". There are no
> > > > particular
> > > > > >> > > > > >stewards (no
> > > > > >> > > > > >>> > need).
> > > > > >> > > > > >>> >    Users of particular protocols should mainly
> > > maintain
> > > > > >> those
> > > > > >> > and
> > > > > >> > > > > >add
> > > > > >> > > > > >>> > support
> > > > > >> > > > > >>> >    for different versions of the protocols.
> > > > > >> > > > > >>> >    - *software* - both API and software are
> controlled
> > > > by
> > > > > >> > someone
> > > > > >> > > > > >>> outside
> > > > > >> > > > > >>> >    of Airflow (commercial or open-source
> project), but
> > > > the
> > > > > >> > > > > >deployment of
> > > > > >> > > > > >>> > that
> > > > > >> > > > > >>> >    software is "owned" by the user installing
> Airflow.
> > > > The
> > > > > >> > > > > >"stewardship"
> > > > > >> > > > > >>> > might
> > > > > >> > > > > >>> >    be also the users but the controlling party
> (Oracle
> > > > for
> > > > > >> > > > > >example)
> > > > > >> > > > > >>> might
> > > > > >> > > > > >>> > be
> > > > > >> > > > > >>> >    interested in maintaining those operators as
> well.
> > > > > >> > > > > >>> >    - *providers* - API/software/deployments are
> fully
> > > > > >> > controlled
> > > > > >> > > > > >by a
> > > > > >> > > > > >>> 3rd
> > > > > >> > > > > >>> >    party. Here most likely "provider" will be
> > > > interested in
> > > > > >> > > > > >maintaining
> > > > > >> > > > > >>> the
> > > > > >> > > > > >>> >    operators (and for example like Google -
> provide
> > > > > >> integration
> > > > > >> > > > > >>> guidelines
> > > > > >> > > > > >>> >    <
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > > >> > > > > >>> > >
> > > > > >> > > > > >>> > for
> > > > > >> > > > > >>> >    their hooks/operators/sensors)
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 3) Between-providers transfer operators should be
> kept
> > > > at
> > > > > >> the
> > > > > >> > > > > >"target"
> > > > > >> > > > > >>> > rather than "source"
> > > > > >> > > > > >>> > For example S3 -> GCS should be in "google"
> provider,
> > > > but
> > > > > >> > GCS-> S3
> > > > > >> > > > > >>> should
> > > > > >> > > > > >>> > be in "amazon".
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 4) One-side provider transfer operators should be
> kept
> > > > at
> > > > > >> the
> > > > > >> > > > > >"provider"
> > > > > >> > > > > >>> > regardless if they are target or source.
> > > > > >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in
> > > > "google"
> > > > > >> > > > > >provider.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > 5) If in doubt we will discuss individual cases
> > > > separately.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > J.
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > --
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > Jarek Potiuk
> > > > > >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal
> > > Software
> > > > > >> > Engineer
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > > >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > > >> > > > > >>> >
> > > > > >> > > > > >>>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >> --
> > > > > >> > > > > >>
> > > > > >> > > > > >> Jarek Potiuk
> > > > > >> > > > > >> Polidea <https://www.polidea.com/> | Principal
> Software
> > > > > >> Engineer
> > > > > >> > > > > >>
> > > > > >> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > > >> > > > > >>
> > > > > >> > > > > >>
> > > > > >> > > > > >
> > > > > >> > > > > >--
> > > > > >> > > > > >
> > > > > >> > > > > >Jarek Potiuk
> > > > > >> > > > > >Polidea <https://www.polidea.com/> | Principal
> Software
> > > > Engineer
> > > > > >> > > > > >
> > > > > >> > > > > >M: +48 660 796 129 <+48660796129>
> > > > > >> > > > > >[image: Polidea] <https://www.polidea.com/>
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > >
> > > > > >> > > > Jarek Potiuk
> > > > > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > > Engineer
> > > > > >> > > >
> > > > > >> > > > M: +48 660 796 129 <+48660796129>
> > > > > >> > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >> > > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48660796129>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kamil Breguła <ka...@polidea.com>.
 One more question. Are you sure you want to move Python and Bash from
core?  These are the elements that are installed in every environment
because they are required by Airflow, so moving them to a separate
installed package is pointless in my opinion.

On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> I am fine with this list +1
>
> On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > I am all for it Kamil!
> >
> > Super happy to treat Apache projects in the same way as "proprietary"
> > providers :). Anyone else has some other comments ?
> >
> > J.
> >
> > On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <ka...@polidea.com>
> > wrote:
> >
> > > I looked at this list and I'm only worried about two operators.
> > >
> > > airflow.contrib.operators.vertica_to_hive
> > > airflow.contrib.operators.s3_to_hive
> > >
> > > If we want the operators to be grouped according to destination, then
> > > this operator should be in apache package. It is the members of the
> > > Apache community who will care most about this operator being of high
> > > quality. Apache can be treated equally with other large cloud
> > > providers, such as GCP, AWS. I can imagine that a new Apache product
> > > will appear and it will want to promote the same way as products of
> > > cloud providers are promoted. By creating a large number of
> > > integrations that allow you to copy data to its operating range.
> > > There's another cases - building a strong Apache community. As a
> > > member of the Apache community, we should promote Apache products to
> > > ensure that the development of the community is correct, and therefore
> > > also for integration into our products with other products.
> > >
> > > On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > > >
> > > > Just to select the "packages" for this update. Anyone has objections
> > for
> > > > this structure (details including transfer operators in
> > > >
> > > > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> > > > Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> > > >
> > > > *Fundamentals (no change)*
> > > >
> > > >
> > > >
> > > > providers
> > > >
> > > >
> > > >
> > > >
> > > > google
> > > >
> > > >
> > > >
> > > >
> > > > cloud
> > > >
> > > >
> > > >
> > > > gsuite
> > > >
> > > >
> > > >
> > > > marketing_platform
> > > >
> > > >
> > > > amazon
> > > >
> > > >
> > > >
> > > >
> > > > aws
> > > >
> > > >
> > > > microsoft
> > > >
> > > >
> > > >
> > > >
> > > > azure
> > > >
> > > >
> > > > apache
> > > >
> > > >
> > > >
> > > >
> > > > cassandra
> > > >
> > > >
> > > >
> > > > druid
> > > >
> > > >
> > > >
> > > > hadoop
> > > >
> > > >
> > > >
> > > > hive
> > > >
> > > >
> > > >
> > > > pig
> > > >
> > > >
> > > >
> > > > pinot
> > > >
> > > >
> > > >
> > > > spark
> > > >
> > > >
> > > >
> > > > sqoop
> > > >
> > > >
> > > > mysql
> > > >
> > > >
> > > >
> > > > jira
> > > >
> > > >
> > > >
> > > > databricks
> > > >
> > > >
> > > >
> > > > datadog
> > > >
> > > >
> > > >
> > > > dingding
> > > >
> > > >
> > > >
> > > > discord
> > > >
> > > >
> > > >
> > > > cloudant
> > > >
> > > >
> > > >
> > > > jenkins
> > > >
> > > >
> > > >
> > > > opsgenie
> > > >
> > > >
> > > >
> > > > qubole
> > > >
> > > >
> > > >
> > > > salesforce
> > > >
> > > >
> > > >
> > > > segment
> > > >
> > > >
> > > >
> > > > slack
> > > >
> > > >
> > > >
> > > > snowflake
> > > >
> > > >
> > > >
> > > > vertica
> > > >
> > > >
> > > >
> > > > zendesk
> > > >
> > > >
> > > >
> > > > celery
> > > >
> > > >
> > > >
> > > > docker
> > > >
> > > >
> > > >
> > > > bash
> > > >
> > > >
> > > >
> > > > kubernetes
> > > >
> > > >
> > > >
> > > > mssql
> > > >
> > > >
> > > >
> > > > mongodb
> > > >
> > > >
> > > >
> > > > mysql
> > > >
> > > >
> > > >
> > > > openfaas
> > > >
> > > >
> > > >
> > > > oracle
> > > >
> > > >
> > > >
> > > > papermill
> > > >
> > > >
> > > >
> > > > postgres
> > > >
> > > >
> > > >
> > > > presto
> > > >
> > > >
> > > >
> > > > python
> > > >
> > > >
> > > >
> > > > redis
> > > >
> > > >
> > > >
> > > > samba
> > > >
> > > >
> > > >
> > > > sqlite
> > > >
> > > >
> > > >
> > > > imap
> > > >
> > > >
> > > >
> > > > ssh
> > > >
> > > >
> > > >
> > > > filesystem
> > > >
> > > >
> > > >
> > > > sftp
> > > >
> > > >
> > > >
> > > > ftp
> > > >
> > > >
> > > >
> > > > http
> > > >
> > > >
> > > >
> > > > grpc
> > > >
> > > >
> > > >
> > > > smtp
> > > >
> > > >
> > > >
> > > > jdbc
> > > >
> > > >
> > > >
> > > > winrm
> > > >
> > > >
> > > >
> > > > On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <Ja...@polidea.com>
> > > > wrote:
> > > >
> > > > > Let me then cancel this vote and I will restart it next week.
> > > > >
> > > > > Yeah. It's a bit like re-opening the Pandora's box but now that we
> > know
> > > > > that we can do it, and we are unblocked in moving to google (which is
> > > now
> > > > > the biggest move in-progress),  we can spend more time on getting
> > > better
> > > > > (and more final) consensus.
> > > > > I decided to go through the list from the docs (once again Kamil -
> > > great
> > > > > that you did it) and prepared this spreadsheet showing the
> > structure. I
> > > > > went through ALL the operators and put them in the right place where
> > > our
> > > > > current rules place them.
> > > > >
> > > > > After this exercise, I think that makes sense:
> > > > > - put all the stuff except fundamentals in *"providers"* (everything
> > > > > in "providers" will be potentially backportable).
> > > > > - grouping apache projects under *"apache"* - similar to
> > > > > google/amazon/microsoft (different kind of ownership but still it is
> > an
> > > > > ownership)
> > > > > - for the rest I think what we can do is really to put the operators
> > in
> > > > > folders per "service/company" (without sub-packages). That includes
> > > > > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp]
> > ??).
> > > > > there is no "ownership" there and no reason to group them. That will
> > > put
> > > > > "operators/hooks/sensors" at different levels in the directory tree
> > > but we
> > > > > already have that for fundamentals and I am not too worried about
> > > that. We
> > > > > do not have to have everything at the same level.
> > > > > - I put transfer operators according to the rule where "to" side is
> > > more
> > > > > important unless the other side is a public protocol (so sftp -> gcs
> > > and
> > > > > gcs -> sftp both go to google/gcp). I did not have any doubt where to
> > > put
> > > > > which transfer operator, so this is a good sign:
> > > > >
> > > > >
> > > > >
> > >
> > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> > > > >
> > > > > Can you please take a look and express your opinions here so that we
> > > can
> > > > > have final voting next week (for those who are not yet tired with the
> > > > > discussion ;)).
> > > > >
> > > > > J.
> > > > >
> > > > > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > > >
> > > > >> Yes, that makes sense.
> > > > >>
> > > > >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> > > kamil.bregula@polidea.com>
> > > > >> wrote:
> > > > >>
> > > > >> > In the case of Hadoop, it is published by Apache, so it can be in
> > > the
> > > > >> > apache directory.  This will mimic the grouping presented in the
> > > > >> > documentation.
> > > > >> >
> > > > >>
> > >
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> > > > >> >
> > > > >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > >> > >
> > > > >> > > I think we should keep the vote open at least until mid next
> > week
> > > to
> > > > >> have
> > > > >> > > more thought and inputs on this one.
> > > > >> > >
> > > > >> > > In general, I am happy with the approach but operators/hooks and
> > > > >> sensors
> > > > >> > > shouldn't be a provider. "hadoop" can be its provider and hdfs
> > > can be
> > > > >> a
> > > > >> > > part of it.
> > > > >> > >
> > > > >> > > providers/
> > > > >> > >     google
> > > > >> > >          cloud
> > > > >> > >              operators
> > > > >> > >              hooks
> > > > >> > >              sensors
> > > > >> > >          gsuite
> > > > >> > >              operators
> > > > >> > >              ...
> > > > >> > >     amazon
> > > > >> > >          aws
> > > > >> > >              operators
> > > > >> > >              ...
> > > > >> > >     microsoft
> > > > >> > >          azure
> > > > >> > >              operators
> > > > >> > >              ...
> > > > >> > >     hadoop
> > > > >> > >         hdfs
> > > > >> > >              operators
> > > > >> > >              ...
> > > > >> > >
> > > > >> > > We can also define what is a "provider" so we know what to add
> > in
> > > it
> > > > >> in
> > > > >> > the
> > > > >> > > future. SSH/FTP/SFTP belongs to the same family group. Do we
> > want
> > > to
> > > > >> have
> > > > >> > > separate providers for each one of them ???
> > > > >> > >
> > > > >> > > Regards,
> > > > >> > > Kaxil
> > > > >> > >
> > > > >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > I really like to make everything a provider. That's a great
> > > idea !
> > > > >> > This way
> > > > >> > > > everything "backportable" will have to be in "providers"
> > > package.
> > > > >> > Really
> > > > >> > > > nice and clean separation (and less mess in "airflow"). And we
> > > will
> > > > >> not
> > > > >> > > > have to have any artificial grouping (we can still group them
> > > at the
> > > > >> > > > documentation level).
> > > > >> > > >
> > > > >> > > > We do not need backport in name. And I think it's more of
> > > technical
> > > > >> > detail
> > > > >> > > > on naming the package which we can work out while reviewing
> > PRs
> > > and
> > > > >> we
> > > > >> > can
> > > > >> > > > agree final naming of the released packaged on PMC level (PMCs
> > > will
> > > > >> > have to
> > > > >> > > > vote on releasing those).
> > > > >> > > >
> > > > >> > > > The thinking is that it's intention is really to be only
> > > backported
> > > > >> to
> > > > >> > 1.10
> > > > >> > > > - we are not going (yet) to use the packages in Airflow 2.*.
> > so
> > > I
> > > > >> > thought
> > > > >> > > > by naming them backport we can express that intent more
> > clearly.
> > > > >> > > >
> > > > >> > > > So let me clarify the structure of folders we are going to
> > have
> > > if
> > > > >> we
> > > > >> > > > follow it (i just added some examples) including the already
> > > agreed
> > > > >> > changes
> > > > >> > > > from AIP-21:
> > > > >> > > >
> > > > >> > > > providers/
> > > > >> > > >     google
> > > > >> > > >          cloud
> > > > >> > > >              operators
> > > > >> > > >              hooks
> > > > >> > > >              sensors
> > > > >> > > >          gsuite
> > > > >> > > >              operators
> > > > >> > > >              ...
> > > > >> > > >     amazon
> > > > >> > > >          aws
> > > > >> > > >              operators
> > > > >> > > >              ...
> > > > >> > > >     microsoft
> > > > >> > > >          azure
> > > > >> > > >              operators
> > > > >> > > >              ...
> > > > >> > > >     operators
> > > > >> > > >          sqlite.py
> > > > >> > > >          oracle.py
> > > > >> > > >          docker.py
> > > > >> > > >     hooks
> > > > >> > > >          hdfs.py
> > > > >> > > >          sqlite.py
> > > > >> > > >     sensors
> > > > >> > > >          http.py
> > > > >> > > >          sql.py
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > J.
> > > > >> > > >
> > > > >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> > > ash@apache.org>
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > Do we need to include `-backport,`? What was the thinking
> > > behind
> > > > >> > that?
> > > > >> > > > >
> > > > >> > > > > I think software and protocol should be merged. I would also
> > > say
> > > > >> > > > > _everything_ is a provider, so
> > > airflow.providers.ssh.SSHOperator
> > > > >> for
> > > > >> > > > > instance is what I would prefer
> > > > >> > > > >
> > > > >> > > > > -a
> > > > >> > > > >
> > > > >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > > > >> > Jarek.Potiuk@polidea.com>
> > > > >> > > > > wrote:
> > > > >> > > > > >One more day to go. I would love to see some opinions on
> > this
> > > > >> AIP-21
> > > > >> > > > > >update
> > > > >> > > > > >:).
> > > > >> > > > > >
> > > > >> > > > > >Executive summary:
> > > > >> > > > > >
> > > > >> > > > > >* we will be moving a number of integrations to
> > sub-packages
> > > of
> > > > >> > > > > >airflow.
> > > > >> > > > > >* they will be backportable to 1.10.*.  There will be
> > > > >> > > > > >'apache-airflow-[package]-backport' pypi installable with
> > > python
> > > > >> 3
> > > > >> > that
> > > > >> > > > > >will make Airflow 2.0 operators/hooks etc. available with
> > > 1.10*
> > > > >> > > > > >operators.
> > > > >> > > > > >* the current proposal for sub-packages is
> > > > >> > > > > >"protocols/software/providers/"
> > > > >> > > > > >(but if you think merging protocols and software makes
> > sense
> > > -
> > > > >> > please
> > > > >> > > > > >express your opinion
> > > > >> > > > > >* we are not moving "fundamental" operators/hooks etc..
> > > > >> > > > > >* Airflow 2.0 is still going to be installed as a single
> > > package
> > > > >> > with
> > > > >> > > > > >all
> > > > >> > > > > >operators (so we are not yet implementing AIP-8)
> > > > >> > > > > >
> > > > >> > > > > >J.
> > > > >> > > > > >
> > > > >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > > > >> > Jarek.Potiuk@polidea.com>
> > > > >> > > > > >wrote:
> > > > >> > > > > >
> > > > >> > > > > >> I think all this cases are valid but maybe I was not
> > > > >> super-clear.
> > > > >> > > > > >It's
> > > > >> > > > > >> only the transfer operators that we need to decide where
> > to
> > > > >> put -
> > > > >> > not
> > > > >> > > > > >> hooks.
> > > > >> > > > > >> Usually the complexity of communication with particular
> > > > >> storages
> > > > >> > is
> > > > >> > > > > >(or at
> > > > >> > > > > >> least should be) in the Hooks rather than Operators.
> > > > >> > > > > >>
> > > > >> > > > > >> Operators should be just thin wrappers over the logic in
> > > the
> > > > >> > hooks.
> > > > >> > > > > >> Hooks are going to stay where they belong - S3 Hooks in
> > > amazon,
> > > > >> > GCS
> > > > >> > > > > >Hooks
> > > > >> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > > >> > > > > >>
> > > > >> > > > > >> Since we actually have mono-repo - this will be no
> > problem
> > > > >> (and no
> > > > >> > > > > >cross
> > > > >> > > > > >> dependencies problem) to have S3 -> GCS operator  in
> > > google and
> > > > >> > use
> > > > >> > > > > >hooks
> > > > >> > > > > >> from both google/amazon.
> > > > >> > > > > >>
> > > > >> > > > > >> I hope this alleviates your concern Daniel ?
> > > > >> > > > > >>
> > > > >> > > > > >> J.
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These
> > > you
> > > > >> would
> > > > >> > > > > >put in
> > > > >> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp
> > > would
> > > > >> be in
> > > > >> > > > > >google
> > > > >> > > > > >>> sheets operators file?  The complexity, and the shared
> > > code,
> > > > >> are
> > > > >> > in
> > > > >> > > > > >the
> > > > >> > > > > >>> gsheet component -- not into the storage destination.
> > > > >> > > > > >>>
> > > > >> > > > > >>>
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > >> > > > > ><Ja...@polidea.com>
> > > > >> > > > > >>> wrote:
> > > > >> > > > > >>>
> > > > >> > > > > >>> > Hello Airflow Community,
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > The email calls for a vote to update AIP-21 Changes in
> > > > >> import
> > > > >> > > > > >paths
> > > > >> > > > > >>> > <
> > > > >> > > > > >>> >
> > > > >> > > > > >>>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > >
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > >> > > > > >>> > >
> > > > >> > > > > >>> > with
> > > > >> > > > > >>> > the changes described below. The vote will last till
> > > > >> Saturday
> > > > >> > 8th
> > > > >> > > > > >2am
> > > > >> > > > > >>> CEST
> > > > >> > > > > >>> > (72 hours). Committers have a binding vote but
> > everyone
> > > from
> > > > >> > the
> > > > >> > > > > >>> community
> > > > >> > > > > >>> > is encouraged to cast an advisory vote.
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > *Summary*:
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > The proposal is to update AIP-21 to move all non-core
> > > > >> > > > > >>> > operators/hooks/sensor (and related files) to
> > > sub-packages
> > > > >> > within
> > > > >> > > > > >>> airflow
> > > > >> > > > > >>> > (protocols/software/providers) or
> > (software/providers).
> > > > >> > > > > >>> > I am also happy to merge protocols+software, so if you
> > > have
> > > > >> a
> > > > >> > > > > >strong
> > > > >> > > > > >>> > opinion on it - please state it with your vote and we
> > > can
> > > > >> > decide
> > > > >> > > > > >based
> > > > >> > > > > >>> on
> > > > >> > > > > >>> > majority.
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > Those packages will be separately released
> > > (schedule/process
> > > > >> > TBD)
> > > > >> > > > > >and
> > > > >> > > > > >>> will
> > > > >> > > > > >>> > be backportable to 1.10.* airflow series, so that
> > users
> > > can
> > > > >> > > > > >install it
> > > > >> > > > > >>> and
> > > > >> > > > > >>> > start using new Airflow2.0 operators in their Python 3
> > > > >> Airflow
> > > > >> > > > > >1.10
> > > > >> > > > > >>> > environments (only Python 3.5+ is supported).
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > We will proceed with migrating the providers package
> > to
> > > > >> already
> > > > >> > > > > >agreed
> > > > >> > > > > >>> > paths without waiting for the final vote (following
> > > current
> > > > >> > > > > >version of
> > > > >> > > > > >>> > AIP-21). Since we have working POC - we know the
> > agreed
> > > > >> paths
> > > > >> > will
> > > > >> > > > > >work
> > > > >> > > > > >>> for
> > > > >> > > > > >>> > us.
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > *Previous discussions: *
> > > > >> > > > > >>> >
> > > > >> > > > > >>> >    -
> > > > >> > > > > >>> >
> > > > >> > > > > >>> >
> > > > >> > > > > >>>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > >
> > https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > >> > > > > >>> >    -
> > > > >> > > > > >>> >
> > > > >> > > > > >>> >
> > > > >> > > > > >>>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > >
> > https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > *More Details*:
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > 1) Information that we are going in the direction of
> > > AIP-8
> > > > >> but
> > > > >> > not
> > > > >> > > > > >yet
> > > > >> > > > > >>> > reaching it - focusing on separating out backportable
> > > > >> packages
> > > > >> > > > > >>> installable
> > > > >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> > > > >> > installed
> > > > >> > > > > >as a
> > > > >> > > > > >>> whole
> > > > >> > > > > >>> > and all the source will be kept in one repo, but we
> > now
> > > > >> have a
> > > > >> > way
> > > > >> > > > > >to
> > > > >> > > > > >>> build
> > > > >> > > > > >>> > backportable packages for groups of operators. POC
> > > available
> > > > >> > here:
> > > > >> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on
> > > Ash's
> > > > >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > 2) We move all integrations to new packages (keeping
> > > > >> deprecated
> > > > >> > > > > >import
> > > > >> > > > > >>> > aliases in the old places). The following split
> > > (according
> > > > >> to
> > > > >> > > > > >>> "stewardship"
> > > > >> > > > > >>> > over the integrations):
> > > > >> > > > > >>> >
> > > > >> > > > > >>> >    - *fundamentals* - core of ariflow - they are
> > really
> > > > >> part of
> > > > >> > > > > >Apache
> > > > >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > >> > > > > >backportable/separated
> > > > >> > > > > >>> out.
> > > > >> > > > > >>> >    - *protocols* - are not owned by anyone, they are
> > > public
> > > > >> and
> > > > >> > > > > >the
> > > > >> > > > > >>> >    implementation is fully "open". There are no
> > > particular
> > > > >> > > > > >stewards (no
> > > > >> > > > > >>> > need).
> > > > >> > > > > >>> >    Users of particular protocols should mainly
> > maintain
> > > > >> those
> > > > >> > and
> > > > >> > > > > >add
> > > > >> > > > > >>> > support
> > > > >> > > > > >>> >    for different versions of the protocols.
> > > > >> > > > > >>> >    - *software* - both API and software are controlled
> > > by
> > > > >> > someone
> > > > >> > > > > >>> outside
> > > > >> > > > > >>> >    of Airflow (commercial or open-source project), but
> > > the
> > > > >> > > > > >deployment of
> > > > >> > > > > >>> > that
> > > > >> > > > > >>> >    software is "owned" by the user installing Airflow.
> > > The
> > > > >> > > > > >"stewardship"
> > > > >> > > > > >>> > might
> > > > >> > > > > >>> >    be also the users but the controlling party (Oracle
> > > for
> > > > >> > > > > >example)
> > > > >> > > > > >>> might
> > > > >> > > > > >>> > be
> > > > >> > > > > >>> >    interested in maintaining those operators as well.
> > > > >> > > > > >>> >    - *providers* - API/software/deployments are fully
> > > > >> > controlled
> > > > >> > > > > >by a
> > > > >> > > > > >>> 3rd
> > > > >> > > > > >>> >    party. Here most likely "provider" will be
> > > interested in
> > > > >> > > > > >maintaining
> > > > >> > > > > >>> the
> > > > >> > > > > >>> >    operators (and for example like Google - provide
> > > > >> integration
> > > > >> > > > > >>> guidelines
> > > > >> > > > > >>> >    <
> > > > >> > > > > >>> >
> > > > >> > > > > >>>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > >
> > https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > >> > > > > >>> > >
> > > > >> > > > > >>> > for
> > > > >> > > > > >>> >    their hooks/operators/sensors)
> > > > >> > > > > >>> >
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > 3) Between-providers transfer operators should be kept
> > > at
> > > > >> the
> > > > >> > > > > >"target"
> > > > >> > > > > >>> > rather than "source"
> > > > >> > > > > >>> > For example S3 -> GCS should be in "google" provider,
> > > but
> > > > >> > GCS-> S3
> > > > >> > > > > >>> should
> > > > >> > > > > >>> > be in "amazon".
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > 4) One-side provider transfer operators should be kept
> > > at
> > > > >> the
> > > > >> > > > > >"provider"
> > > > >> > > > > >>> > regardless if they are target or source.
> > > > >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in
> > > "google"
> > > > >> > > > > >provider.
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > 5) If in doubt we will discuss individual cases
> > > separately.
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > J.
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > --
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > Jarek Potiuk
> > > > >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal
> > Software
> > > > >> > Engineer
> > > > >> > > > > >>> >
> > > > >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > >> > > > > >>> >
> > > > >> > > > > >>>
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >> --
> > > > >> > > > > >>
> > > > >> > > > > >> Jarek Potiuk
> > > > >> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> > > > >> Engineer
> > > > >> > > > > >>
> > > > >> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > >> > > > > >>
> > > > >> > > > > >>
> > > > >> > > > > >
> > > > >> > > > > >--
> > > > >> > > > > >
> > > > >> > > > > >Jarek Potiuk
> > > > >> > > > > >Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > >> > > > > >
> > > > >> > > > > >M: +48 660 796 129 <+48660796129>
> > > > >> > > > > >[image: Polidea] <https://www.polidea.com/>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > Jarek Potiuk
> > > > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > >> > > >
> > > > >> > > > M: +48 660 796 129 <+48660796129>
> > > > >> > > > [image: Polidea] <https://www.polidea.com/>
> > > > >> > > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > > M: +48 660 796 129 <+48660796129>
> > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kaxil Naik <ka...@gmail.com>.
I am fine with this list +1

On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I am all for it Kamil!
>
> Super happy to treat Apache projects in the same way as "proprietary"
> providers :). Anyone else has some other comments ?
>
> J.
>
> On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
> > I looked at this list and I'm only worried about two operators.
> >
> > airflow.contrib.operators.vertica_to_hive
> > airflow.contrib.operators.s3_to_hive
> >
> > If we want the operators to be grouped according to destination, then
> > this operator should be in apache package. It is the members of the
> > Apache community who will care most about this operator being of high
> > quality. Apache can be treated equally with other large cloud
> > providers, such as GCP, AWS. I can imagine that a new Apache product
> > will appear and it will want to promote the same way as products of
> > cloud providers are promoted. By creating a large number of
> > integrations that allow you to copy data to its operating range.
> > There's another cases - building a strong Apache community. As a
> > member of the Apache community, we should promote Apache products to
> > ensure that the development of the community is correct, and therefore
> > also for integration into our products with other products.
> >
> > On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> > >
> > > Just to select the "packages" for this update. Anyone has objections
> for
> > > this structure (details including transfer operators in
> > >
> > > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> > > Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> > >
> > > *Fundamentals (no change)*
> > >
> > >
> > >
> > > providers
> > >
> > >
> > >
> > >
> > > google
> > >
> > >
> > >
> > >
> > > cloud
> > >
> > >
> > >
> > > gsuite
> > >
> > >
> > >
> > > marketing_platform
> > >
> > >
> > > amazon
> > >
> > >
> > >
> > >
> > > aws
> > >
> > >
> > > microsoft
> > >
> > >
> > >
> > >
> > > azure
> > >
> > >
> > > apache
> > >
> > >
> > >
> > >
> > > cassandra
> > >
> > >
> > >
> > > druid
> > >
> > >
> > >
> > > hadoop
> > >
> > >
> > >
> > > hive
> > >
> > >
> > >
> > > pig
> > >
> > >
> > >
> > > pinot
> > >
> > >
> > >
> > > spark
> > >
> > >
> > >
> > > sqoop
> > >
> > >
> > > mysql
> > >
> > >
> > >
> > > jira
> > >
> > >
> > >
> > > databricks
> > >
> > >
> > >
> > > datadog
> > >
> > >
> > >
> > > dingding
> > >
> > >
> > >
> > > discord
> > >
> > >
> > >
> > > cloudant
> > >
> > >
> > >
> > > jenkins
> > >
> > >
> > >
> > > opsgenie
> > >
> > >
> > >
> > > qubole
> > >
> > >
> > >
> > > salesforce
> > >
> > >
> > >
> > > segment
> > >
> > >
> > >
> > > slack
> > >
> > >
> > >
> > > snowflake
> > >
> > >
> > >
> > > vertica
> > >
> > >
> > >
> > > zendesk
> > >
> > >
> > >
> > > celery
> > >
> > >
> > >
> > > docker
> > >
> > >
> > >
> > > bash
> > >
> > >
> > >
> > > kubernetes
> > >
> > >
> > >
> > > mssql
> > >
> > >
> > >
> > > mongodb
> > >
> > >
> > >
> > > mysql
> > >
> > >
> > >
> > > openfaas
> > >
> > >
> > >
> > > oracle
> > >
> > >
> > >
> > > papermill
> > >
> > >
> > >
> > > postgres
> > >
> > >
> > >
> > > presto
> > >
> > >
> > >
> > > python
> > >
> > >
> > >
> > > redis
> > >
> > >
> > >
> > > samba
> > >
> > >
> > >
> > > sqlite
> > >
> > >
> > >
> > > imap
> > >
> > >
> > >
> > > ssh
> > >
> > >
> > >
> > > filesystem
> > >
> > >
> > >
> > > sftp
> > >
> > >
> > >
> > > ftp
> > >
> > >
> > >
> > > http
> > >
> > >
> > >
> > > grpc
> > >
> > >
> > >
> > > smtp
> > >
> > >
> > >
> > > jdbc
> > >
> > >
> > >
> > > winrm
> > >
> > >
> > >
> > > On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > >
> > > > Let me then cancel this vote and I will restart it next week.
> > > >
> > > > Yeah. It's a bit like re-opening the Pandora's box but now that we
> know
> > > > that we can do it, and we are unblocked in moving to google (which is
> > now
> > > > the biggest move in-progress),  we can spend more time on getting
> > better
> > > > (and more final) consensus.
> > > > I decided to go through the list from the docs (once again Kamil -
> > great
> > > > that you did it) and prepared this spreadsheet showing the
> structure. I
> > > > went through ALL the operators and put them in the right place where
> > our
> > > > current rules place them.
> > > >
> > > > After this exercise, I think that makes sense:
> > > > - put all the stuff except fundamentals in *"providers"* (everything
> > > > in "providers" will be potentially backportable).
> > > > - grouping apache projects under *"apache"* - similar to
> > > > google/amazon/microsoft (different kind of ownership but still it is
> an
> > > > ownership)
> > > > - for the rest I think what we can do is really to put the operators
> in
> > > > folders per "service/company" (without sub-packages). That includes
> > > > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp]
> ??).
> > > > there is no "ownership" there and no reason to group them. That will
> > put
> > > > "operators/hooks/sensors" at different levels in the directory tree
> > but we
> > > > already have that for fundamentals and I am not too worried about
> > that. We
> > > > do not have to have everything at the same level.
> > > > - I put transfer operators according to the rule where "to" side is
> > more
> > > > important unless the other side is a public protocol (so sftp -> gcs
> > and
> > > > gcs -> sftp both go to google/gcp). I did not have any doubt where to
> > put
> > > > which transfer operator, so this is a good sign:
> > > >
> > > >
> > > >
> >
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> > > >
> > > > Can you please take a look and express your opinions here so that we
> > can
> > > > have final voting next week (for those who are not yet tired with the
> > > > discussion ;)).
> > > >
> > > > J.
> > > >
> > > > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> > > >
> > > >> Yes, that makes sense.
> > > >>
> > > >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> > kamil.bregula@polidea.com>
> > > >> wrote:
> > > >>
> > > >> > In the case of Hadoop, it is published by Apache, so it can be in
> > the
> > > >> > apache directory.  This will mimic the grouping presented in the
> > > >> > documentation.
> > > >> >
> > > >>
> >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> > > >> >
> > > >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > >> > >
> > > >> > > I think we should keep the vote open at least until mid next
> week
> > to
> > > >> have
> > > >> > > more thought and inputs on this one.
> > > >> > >
> > > >> > > In general, I am happy with the approach but operators/hooks and
> > > >> sensors
> > > >> > > shouldn't be a provider. "hadoop" can be its provider and hdfs
> > can be
> > > >> a
> > > >> > > part of it.
> > > >> > >
> > > >> > > providers/
> > > >> > >     google
> > > >> > >          cloud
> > > >> > >              operators
> > > >> > >              hooks
> > > >> > >              sensors
> > > >> > >          gsuite
> > > >> > >              operators
> > > >> > >              ...
> > > >> > >     amazon
> > > >> > >          aws
> > > >> > >              operators
> > > >> > >              ...
> > > >> > >     microsoft
> > > >> > >          azure
> > > >> > >              operators
> > > >> > >              ...
> > > >> > >     hadoop
> > > >> > >         hdfs
> > > >> > >              operators
> > > >> > >              ...
> > > >> > >
> > > >> > > We can also define what is a "provider" so we know what to add
> in
> > it
> > > >> in
> > > >> > the
> > > >> > > future. SSH/FTP/SFTP belongs to the same family group. Do we
> want
> > to
> > > >> have
> > > >> > > separate providers for each one of them ???
> > > >> > >
> > > >> > > Regards,
> > > >> > > Kaxil
> > > >> > >
> > > >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > I really like to make everything a provider. That's a great
> > idea !
> > > >> > This way
> > > >> > > > everything "backportable" will have to be in "providers"
> > package.
> > > >> > Really
> > > >> > > > nice and clean separation (and less mess in "airflow"). And we
> > will
> > > >> not
> > > >> > > > have to have any artificial grouping (we can still group them
> > at the
> > > >> > > > documentation level).
> > > >> > > >
> > > >> > > > We do not need backport in name. And I think it's more of
> > technical
> > > >> > detail
> > > >> > > > on naming the package which we can work out while reviewing
> PRs
> > and
> > > >> we
> > > >> > can
> > > >> > > > agree final naming of the released packaged on PMC level (PMCs
> > will
> > > >> > have to
> > > >> > > > vote on releasing those).
> > > >> > > >
> > > >> > > > The thinking is that it's intention is really to be only
> > backported
> > > >> to
> > > >> > 1.10
> > > >> > > > - we are not going (yet) to use the packages in Airflow 2.*.
> so
> > I
> > > >> > thought
> > > >> > > > by naming them backport we can express that intent more
> clearly.
> > > >> > > >
> > > >> > > > So let me clarify the structure of folders we are going to
> have
> > if
> > > >> we
> > > >> > > > follow it (i just added some examples) including the already
> > agreed
> > > >> > changes
> > > >> > > > from AIP-21:
> > > >> > > >
> > > >> > > > providers/
> > > >> > > >     google
> > > >> > > >          cloud
> > > >> > > >              operators
> > > >> > > >              hooks
> > > >> > > >              sensors
> > > >> > > >          gsuite
> > > >> > > >              operators
> > > >> > > >              ...
> > > >> > > >     amazon
> > > >> > > >          aws
> > > >> > > >              operators
> > > >> > > >              ...
> > > >> > > >     microsoft
> > > >> > > >          azure
> > > >> > > >              operators
> > > >> > > >              ...
> > > >> > > >     operators
> > > >> > > >          sqlite.py
> > > >> > > >          oracle.py
> > > >> > > >          docker.py
> > > >> > > >     hooks
> > > >> > > >          hdfs.py
> > > >> > > >          sqlite.py
> > > >> > > >     sensors
> > > >> > > >          http.py
> > > >> > > >          sql.py
> > > >> > > >
> > > >> > > >
> > > >> > > > J.
> > > >> > > >
> > > >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> > ash@apache.org>
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Do we need to include `-backport,`? What was the thinking
> > behind
> > > >> > that?
> > > >> > > > >
> > > >> > > > > I think software and protocol should be merged. I would also
> > say
> > > >> > > > > _everything_ is a provider, so
> > airflow.providers.ssh.SSHOperator
> > > >> for
> > > >> > > > > instance is what I would prefer
> > > >> > > > >
> > > >> > > > > -a
> > > >> > > > >
> > > >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > > >> > Jarek.Potiuk@polidea.com>
> > > >> > > > > wrote:
> > > >> > > > > >One more day to go. I would love to see some opinions on
> this
> > > >> AIP-21
> > > >> > > > > >update
> > > >> > > > > >:).
> > > >> > > > > >
> > > >> > > > > >Executive summary:
> > > >> > > > > >
> > > >> > > > > >* we will be moving a number of integrations to
> sub-packages
> > of
> > > >> > > > > >airflow.
> > > >> > > > > >* they will be backportable to 1.10.*.  There will be
> > > >> > > > > >'apache-airflow-[package]-backport' pypi installable with
> > python
> > > >> 3
> > > >> > that
> > > >> > > > > >will make Airflow 2.0 operators/hooks etc. available with
> > 1.10*
> > > >> > > > > >operators.
> > > >> > > > > >* the current proposal for sub-packages is
> > > >> > > > > >"protocols/software/providers/"
> > > >> > > > > >(but if you think merging protocols and software makes
> sense
> > -
> > > >> > please
> > > >> > > > > >express your opinion
> > > >> > > > > >* we are not moving "fundamental" operators/hooks etc..
> > > >> > > > > >* Airflow 2.0 is still going to be installed as a single
> > package
> > > >> > with
> > > >> > > > > >all
> > > >> > > > > >operators (so we are not yet implementing AIP-8)
> > > >> > > > > >
> > > >> > > > > >J.
> > > >> > > > > >
> > > >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > > >> > Jarek.Potiuk@polidea.com>
> > > >> > > > > >wrote:
> > > >> > > > > >
> > > >> > > > > >> I think all this cases are valid but maybe I was not
> > > >> super-clear.
> > > >> > > > > >It's
> > > >> > > > > >> only the transfer operators that we need to decide where
> to
> > > >> put -
> > > >> > not
> > > >> > > > > >> hooks.
> > > >> > > > > >> Usually the complexity of communication with particular
> > > >> storages
> > > >> > is
> > > >> > > > > >(or at
> > > >> > > > > >> least should be) in the Hooks rather than Operators.
> > > >> > > > > >>
> > > >> > > > > >> Operators should be just thin wrappers over the logic in
> > the
> > > >> > hooks.
> > > >> > > > > >> Hooks are going to stay where they belong - S3 Hooks in
> > amazon,
> > > >> > GCS
> > > >> > > > > >Hooks
> > > >> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > >> > > > > >>
> > > >> > > > > >> Since we actually have mono-repo - this will be no
> problem
> > > >> (and no
> > > >> > > > > >cross
> > > >> > > > > >> dependencies problem) to have S3 -> GCS operator  in
> > google and
> > > >> > use
> > > >> > > > > >hooks
> > > >> > > > > >> from both google/amazon.
> > > >> > > > > >>
> > > >> > > > > >> I hope this alleviates your concern Daniel ?
> > > >> > > > > >>
> > > >> > > > > >> J.
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These
> > you
> > > >> would
> > > >> > > > > >put in
> > > >> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp
> > would
> > > >> be in
> > > >> > > > > >google
> > > >> > > > > >>> sheets operators file?  The complexity, and the shared
> > code,
> > > >> are
> > > >> > in
> > > >> > > > > >the
> > > >> > > > > >>> gsheet component -- not into the storage destination.
> > > >> > > > > >>>
> > > >> > > > > >>>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > >> > > > > ><Ja...@polidea.com>
> > > >> > > > > >>> wrote:
> > > >> > > > > >>>
> > > >> > > > > >>> > Hello Airflow Community,
> > > >> > > > > >>> >
> > > >> > > > > >>> > The email calls for a vote to update AIP-21 Changes in
> > > >> import
> > > >> > > > > >paths
> > > >> > > > > >>> > <
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > >> > > > > >>> > >
> > > >> > > > > >>> > with
> > > >> > > > > >>> > the changes described below. The vote will last till
> > > >> Saturday
> > > >> > 8th
> > > >> > > > > >2am
> > > >> > > > > >>> CEST
> > > >> > > > > >>> > (72 hours). Committers have a binding vote but
> everyone
> > from
> > > >> > the
> > > >> > > > > >>> community
> > > >> > > > > >>> > is encouraged to cast an advisory vote.
> > > >> > > > > >>> >
> > > >> > > > > >>> > *Summary*:
> > > >> > > > > >>> >
> > > >> > > > > >>> > The proposal is to update AIP-21 to move all non-core
> > > >> > > > > >>> > operators/hooks/sensor (and related files) to
> > sub-packages
> > > >> > within
> > > >> > > > > >>> airflow
> > > >> > > > > >>> > (protocols/software/providers) or
> (software/providers).
> > > >> > > > > >>> > I am also happy to merge protocols+software, so if you
> > have
> > > >> a
> > > >> > > > > >strong
> > > >> > > > > >>> > opinion on it - please state it with your vote and we
> > can
> > > >> > decide
> > > >> > > > > >based
> > > >> > > > > >>> on
> > > >> > > > > >>> > majority.
> > > >> > > > > >>> >
> > > >> > > > > >>> > Those packages will be separately released
> > (schedule/process
> > > >> > TBD)
> > > >> > > > > >and
> > > >> > > > > >>> will
> > > >> > > > > >>> > be backportable to 1.10.* airflow series, so that
> users
> > can
> > > >> > > > > >install it
> > > >> > > > > >>> and
> > > >> > > > > >>> > start using new Airflow2.0 operators in their Python 3
> > > >> Airflow
> > > >> > > > > >1.10
> > > >> > > > > >>> > environments (only Python 3.5+ is supported).
> > > >> > > > > >>> >
> > > >> > > > > >>> > We will proceed with migrating the providers package
> to
> > > >> already
> > > >> > > > > >agreed
> > > >> > > > > >>> > paths without waiting for the final vote (following
> > current
> > > >> > > > > >version of
> > > >> > > > > >>> > AIP-21). Since we have working POC - we know the
> agreed
> > > >> paths
> > > >> > will
> > > >> > > > > >work
> > > >> > > > > >>> for
> > > >> > > > > >>> > us.
> > > >> > > > > >>> >
> > > >> > > > > >>> > *Previous discussions: *
> > > >> > > > > >>> >
> > > >> > > > > >>> >    -
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > >> > > > > >>> >    -
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > >> > > > > >>> >
> > > >> > > > > >>> > *More Details*:
> > > >> > > > > >>> >
> > > >> > > > > >>> > 1) Information that we are going in the direction of
> > AIP-8
> > > >> but
> > > >> > not
> > > >> > > > > >yet
> > > >> > > > > >>> > reaching it - focusing on separating out backportable
> > > >> packages
> > > >> > > > > >>> installable
> > > >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> > > >> > installed
> > > >> > > > > >as a
> > > >> > > > > >>> whole
> > > >> > > > > >>> > and all the source will be kept in one repo, but we
> now
> > > >> have a
> > > >> > way
> > > >> > > > > >to
> > > >> > > > > >>> build
> > > >> > > > > >>> > backportable packages for groups of operators. POC
> > available
> > > >> > here:
> > > >> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on
> > Ash's
> > > >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > >> > > > > >>> >
> > > >> > > > > >>> > 2) We move all integrations to new packages (keeping
> > > >> deprecated
> > > >> > > > > >import
> > > >> > > > > >>> > aliases in the old places). The following split
> > (according
> > > >> to
> > > >> > > > > >>> "stewardship"
> > > >> > > > > >>> > over the integrations):
> > > >> > > > > >>> >
> > > >> > > > > >>> >    - *fundamentals* - core of ariflow - they are
> really
> > > >> part of
> > > >> > > > > >Apache
> > > >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > >> > > > > >backportable/separated
> > > >> > > > > >>> out.
> > > >> > > > > >>> >    - *protocols* - are not owned by anyone, they are
> > public
> > > >> and
> > > >> > > > > >the
> > > >> > > > > >>> >    implementation is fully "open". There are no
> > particular
> > > >> > > > > >stewards (no
> > > >> > > > > >>> > need).
> > > >> > > > > >>> >    Users of particular protocols should mainly
> maintain
> > > >> those
> > > >> > and
> > > >> > > > > >add
> > > >> > > > > >>> > support
> > > >> > > > > >>> >    for different versions of the protocols.
> > > >> > > > > >>> >    - *software* - both API and software are controlled
> > by
> > > >> > someone
> > > >> > > > > >>> outside
> > > >> > > > > >>> >    of Airflow (commercial or open-source project), but
> > the
> > > >> > > > > >deployment of
> > > >> > > > > >>> > that
> > > >> > > > > >>> >    software is "owned" by the user installing Airflow.
> > The
> > > >> > > > > >"stewardship"
> > > >> > > > > >>> > might
> > > >> > > > > >>> >    be also the users but the controlling party (Oracle
> > for
> > > >> > > > > >example)
> > > >> > > > > >>> might
> > > >> > > > > >>> > be
> > > >> > > > > >>> >    interested in maintaining those operators as well.
> > > >> > > > > >>> >    - *providers* - API/software/deployments are fully
> > > >> > controlled
> > > >> > > > > >by a
> > > >> > > > > >>> 3rd
> > > >> > > > > >>> >    party. Here most likely "provider" will be
> > interested in
> > > >> > > > > >maintaining
> > > >> > > > > >>> the
> > > >> > > > > >>> >    operators (and for example like Google - provide
> > > >> integration
> > > >> > > > > >>> guidelines
> > > >> > > > > >>> >    <
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > >> > > > > >>> > >
> > > >> > > > > >>> > for
> > > >> > > > > >>> >    their hooks/operators/sensors)
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>> > 3) Between-providers transfer operators should be kept
> > at
> > > >> the
> > > >> > > > > >"target"
> > > >> > > > > >>> > rather than "source"
> > > >> > > > > >>> > For example S3 -> GCS should be in "google" provider,
> > but
> > > >> > GCS-> S3
> > > >> > > > > >>> should
> > > >> > > > > >>> > be in "amazon".
> > > >> > > > > >>> >
> > > >> > > > > >>> > 4) One-side provider transfer operators should be kept
> > at
> > > >> the
> > > >> > > > > >"provider"
> > > >> > > > > >>> > regardless if they are target or source.
> > > >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in
> > "google"
> > > >> > > > > >provider.
> > > >> > > > > >>> >
> > > >> > > > > >>> > 5) If in doubt we will discuss individual cases
> > separately.
> > > >> > > > > >>> >
> > > >> > > > > >>> > J.
> > > >> > > > > >>> >
> > > >> > > > > >>> > --
> > > >> > > > > >>> >
> > > >> > > > > >>> > Jarek Potiuk
> > > >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal
> Software
> > > >> > Engineer
> > > >> > > > > >>> >
> > > >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >> --
> > > >> > > > > >>
> > > >> > > > > >> Jarek Potiuk
> > > >> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> > > >> Engineer
> > > >> > > > > >>
> > > >> > > > > >> M: +48 660 796 129 <+48660796129>
> > > >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >
> > > >> > > > > >--
> > > >> > > > > >
> > > >> > > > > >Jarek Potiuk
> > > >> > > > > >Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > >> > > > > >
> > > >> > > > > >M: +48 660 796 129 <+48660796129>
> > > >> > > > > >[image: Polidea] <https://www.polidea.com/>
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > >
> > > >> > > > Jarek Potiuk
> > > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > >> > > >
> > > >> > > > M: +48 660 796 129 <+48660796129>
> > > >> > > > [image: Polidea] <https://www.polidea.com/>
> > > >> > > >
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
I am all for it Kamil!

Super happy to treat Apache projects in the same way as "proprietary"
providers :). Anyone else has some other comments ?

J.

On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła <ka...@polidea.com>
wrote:

> I looked at this list and I'm only worried about two operators.
>
> airflow.contrib.operators.vertica_to_hive
> airflow.contrib.operators.s3_to_hive
>
> If we want the operators to be grouped according to destination, then
> this operator should be in apache package. It is the members of the
> Apache community who will care most about this operator being of high
> quality. Apache can be treated equally with other large cloud
> providers, such as GCP, AWS. I can imagine that a new Apache product
> will appear and it will want to promote the same way as products of
> cloud providers are promoted. By creating a large number of
> integrations that allow you to copy data to its operating range.
> There's another cases - building a strong Apache community. As a
> member of the Apache community, we should promote Apache products to
> ensure that the development of the community is correct, and therefore
> also for integration into our products with other products.
>
> On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
> >
> > Just to select the "packages" for this update. Anyone has objections for
> > this structure (details including transfer operators in
> >
> > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> > Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
> >
> > *Fundamentals (no change)*
> >
> >
> >
> > providers
> >
> >
> >
> >
> > google
> >
> >
> >
> >
> > cloud
> >
> >
> >
> > gsuite
> >
> >
> >
> > marketing_platform
> >
> >
> > amazon
> >
> >
> >
> >
> > aws
> >
> >
> > microsoft
> >
> >
> >
> >
> > azure
> >
> >
> > apache
> >
> >
> >
> >
> > cassandra
> >
> >
> >
> > druid
> >
> >
> >
> > hadoop
> >
> >
> >
> > hive
> >
> >
> >
> > pig
> >
> >
> >
> > pinot
> >
> >
> >
> > spark
> >
> >
> >
> > sqoop
> >
> >
> > mysql
> >
> >
> >
> > jira
> >
> >
> >
> > databricks
> >
> >
> >
> > datadog
> >
> >
> >
> > dingding
> >
> >
> >
> > discord
> >
> >
> >
> > cloudant
> >
> >
> >
> > jenkins
> >
> >
> >
> > opsgenie
> >
> >
> >
> > qubole
> >
> >
> >
> > salesforce
> >
> >
> >
> > segment
> >
> >
> >
> > slack
> >
> >
> >
> > snowflake
> >
> >
> >
> > vertica
> >
> >
> >
> > zendesk
> >
> >
> >
> > celery
> >
> >
> >
> > docker
> >
> >
> >
> > bash
> >
> >
> >
> > kubernetes
> >
> >
> >
> > mssql
> >
> >
> >
> > mongodb
> >
> >
> >
> > mysql
> >
> >
> >
> > openfaas
> >
> >
> >
> > oracle
> >
> >
> >
> > papermill
> >
> >
> >
> > postgres
> >
> >
> >
> > presto
> >
> >
> >
> > python
> >
> >
> >
> > redis
> >
> >
> >
> > samba
> >
> >
> >
> > sqlite
> >
> >
> >
> > imap
> >
> >
> >
> > ssh
> >
> >
> >
> > filesystem
> >
> >
> >
> > sftp
> >
> >
> >
> > ftp
> >
> >
> >
> > http
> >
> >
> >
> > grpc
> >
> >
> >
> > smtp
> >
> >
> >
> > jdbc
> >
> >
> >
> > winrm
> >
> >
> >
> > On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > Let me then cancel this vote and I will restart it next week.
> > >
> > > Yeah. It's a bit like re-opening the Pandora's box but now that we know
> > > that we can do it, and we are unblocked in moving to google (which is
> now
> > > the biggest move in-progress),  we can spend more time on getting
> better
> > > (and more final) consensus.
> > > I decided to go through the list from the docs (once again Kamil -
> great
> > > that you did it) and prepared this spreadsheet showing the structure. I
> > > went through ALL the operators and put them in the right place where
> our
> > > current rules place them.
> > >
> > > After this exercise, I think that makes sense:
> > > - put all the stuff except fundamentals in *"providers"* (everything
> > > in "providers" will be potentially backportable).
> > > - grouping apache projects under *"apache"* - similar to
> > > google/amazon/microsoft (different kind of ownership but still it is an
> > > ownership)
> > > - for the rest I think what we can do is really to put the operators in
> > > folders per "service/company" (without sub-packages). That includes
> > > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp] ??).
> > > there is no "ownership" there and no reason to group them. That will
> put
> > > "operators/hooks/sensors" at different levels in the directory tree
> but we
> > > already have that for fundamentals and I am not too worried about
> that. We
> > > do not have to have everything at the same level.
> > > - I put transfer operators according to the rule where "to" side is
> more
> > > important unless the other side is a public protocol (so sftp -> gcs
> and
> > > gcs -> sftp both go to google/gcp). I did not have any doubt where to
> put
> > > which transfer operator, so this is a good sign:
> > >
> > >
> > >
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> > >
> > > Can you please take a look and express your opinions here so that we
> can
> > > have final voting next week (for those who are not yet tired with the
> > > discussion ;)).
> > >
> > > J.
> > >
> > > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > >> Yes, that makes sense.
> > >>
> > >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <
> kamil.bregula@polidea.com>
> > >> wrote:
> > >>
> > >> > In the case of Hadoop, it is published by Apache, so it can be in
> the
> > >> > apache directory.  This will mimic the grouping presented in the
> > >> > documentation.
> > >> >
> > >>
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> > >> >
> > >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> > >> > >
> > >> > > I think we should keep the vote open at least until mid next week
> to
> > >> have
> > >> > > more thought and inputs on this one.
> > >> > >
> > >> > > In general, I am happy with the approach but operators/hooks and
> > >> sensors
> > >> > > shouldn't be a provider. "hadoop" can be its provider and hdfs
> can be
> > >> a
> > >> > > part of it.
> > >> > >
> > >> > > providers/
> > >> > >     google
> > >> > >          cloud
> > >> > >              operators
> > >> > >              hooks
> > >> > >              sensors
> > >> > >          gsuite
> > >> > >              operators
> > >> > >              ...
> > >> > >     amazon
> > >> > >          aws
> > >> > >              operators
> > >> > >              ...
> > >> > >     microsoft
> > >> > >          azure
> > >> > >              operators
> > >> > >              ...
> > >> > >     hadoop
> > >> > >         hdfs
> > >> > >              operators
> > >> > >              ...
> > >> > >
> > >> > > We can also define what is a "provider" so we know what to add in
> it
> > >> in
> > >> > the
> > >> > > future. SSH/FTP/SFTP belongs to the same family group. Do we want
> to
> > >> have
> > >> > > separate providers for each one of them ???
> > >> > >
> > >> > > Regards,
> > >> > > Kaxil
> > >> > >
> > >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > I really like to make everything a provider. That's a great
> idea !
> > >> > This way
> > >> > > > everything "backportable" will have to be in "providers"
> package.
> > >> > Really
> > >> > > > nice and clean separation (and less mess in "airflow"). And we
> will
> > >> not
> > >> > > > have to have any artificial grouping (we can still group them
> at the
> > >> > > > documentation level).
> > >> > > >
> > >> > > > We do not need backport in name. And I think it's more of
> technical
> > >> > detail
> > >> > > > on naming the package which we can work out while reviewing PRs
> and
> > >> we
> > >> > can
> > >> > > > agree final naming of the released packaged on PMC level (PMCs
> will
> > >> > have to
> > >> > > > vote on releasing those).
> > >> > > >
> > >> > > > The thinking is that it's intention is really to be only
> backported
> > >> to
> > >> > 1.10
> > >> > > > - we are not going (yet) to use the packages in Airflow 2.*. so
> I
> > >> > thought
> > >> > > > by naming them backport we can express that intent more clearly.
> > >> > > >
> > >> > > > So let me clarify the structure of folders we are going to have
> if
> > >> we
> > >> > > > follow it (i just added some examples) including the already
> agreed
> > >> > changes
> > >> > > > from AIP-21:
> > >> > > >
> > >> > > > providers/
> > >> > > >     google
> > >> > > >          cloud
> > >> > > >              operators
> > >> > > >              hooks
> > >> > > >              sensors
> > >> > > >          gsuite
> > >> > > >              operators
> > >> > > >              ...
> > >> > > >     amazon
> > >> > > >          aws
> > >> > > >              operators
> > >> > > >              ...
> > >> > > >     microsoft
> > >> > > >          azure
> > >> > > >              operators
> > >> > > >              ...
> > >> > > >     operators
> > >> > > >          sqlite.py
> > >> > > >          oracle.py
> > >> > > >          docker.py
> > >> > > >     hooks
> > >> > > >          hdfs.py
> > >> > > >          sqlite.py
> > >> > > >     sensors
> > >> > > >          http.py
> > >> > > >          sql.py
> > >> > > >
> > >> > > >
> > >> > > > J.
> > >> > > >
> > >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <
> ash@apache.org>
> > >> > wrote:
> > >> > > >
> > >> > > > > Do we need to include `-backport,`? What was the thinking
> behind
> > >> > that?
> > >> > > > >
> > >> > > > > I think software and protocol should be merged. I would also
> say
> > >> > > > > _everything_ is a provider, so
> airflow.providers.ssh.SSHOperator
> > >> for
> > >> > > > > instance is what I would prefer
> > >> > > > >
> > >> > > > > -a
> > >> > > > >
> > >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > >> > Jarek.Potiuk@polidea.com>
> > >> > > > > wrote:
> > >> > > > > >One more day to go. I would love to see some opinions on this
> > >> AIP-21
> > >> > > > > >update
> > >> > > > > >:).
> > >> > > > > >
> > >> > > > > >Executive summary:
> > >> > > > > >
> > >> > > > > >* we will be moving a number of integrations to sub-packages
> of
> > >> > > > > >airflow.
> > >> > > > > >* they will be backportable to 1.10.*.  There will be
> > >> > > > > >'apache-airflow-[package]-backport' pypi installable with
> python
> > >> 3
> > >> > that
> > >> > > > > >will make Airflow 2.0 operators/hooks etc. available with
> 1.10*
> > >> > > > > >operators.
> > >> > > > > >* the current proposal for sub-packages is
> > >> > > > > >"protocols/software/providers/"
> > >> > > > > >(but if you think merging protocols and software makes sense
> -
> > >> > please
> > >> > > > > >express your opinion
> > >> > > > > >* we are not moving "fundamental" operators/hooks etc..
> > >> > > > > >* Airflow 2.0 is still going to be installed as a single
> package
> > >> > with
> > >> > > > > >all
> > >> > > > > >operators (so we are not yet implementing AIP-8)
> > >> > > > > >
> > >> > > > > >J.
> > >> > > > > >
> > >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > >> > Jarek.Potiuk@polidea.com>
> > >> > > > > >wrote:
> > >> > > > > >
> > >> > > > > >> I think all this cases are valid but maybe I was not
> > >> super-clear.
> > >> > > > > >It's
> > >> > > > > >> only the transfer operators that we need to decide where to
> > >> put -
> > >> > not
> > >> > > > > >> hooks.
> > >> > > > > >> Usually the complexity of communication with particular
> > >> storages
> > >> > is
> > >> > > > > >(or at
> > >> > > > > >> least should be) in the Hooks rather than Operators.
> > >> > > > > >>
> > >> > > > > >> Operators should be just thin wrappers over the logic in
> the
> > >> > hooks.
> > >> > > > > >> Hooks are going to stay where they belong - S3 Hooks in
> amazon,
> > >> > GCS
> > >> > > > > >Hooks
> > >> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > >> > > > > >>
> > >> > > > > >> Since we actually have mono-repo - this will be no problem
> > >> (and no
> > >> > > > > >cross
> > >> > > > > >> dependencies problem) to have S3 -> GCS operator  in
> google and
> > >> > use
> > >> > > > > >hooks
> > >> > > > > >> from both google/amazon.
> > >> > > > > >>
> > >> > > > > >> I hope this alleviates your concern Daniel ?
> > >> > > > > >>
> > >> > > > > >> J.
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These
> you
> > >> would
> > >> > > > > >put in
> > >> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp
> would
> > >> be in
> > >> > > > > >google
> > >> > > > > >>> sheets operators file?  The complexity, and the shared
> code,
> > >> are
> > >> > in
> > >> > > > > >the
> > >> > > > > >>> gsheet component -- not into the storage destination.
> > >> > > > > >>>
> > >> > > > > >>>
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > >> > > > > ><Ja...@polidea.com>
> > >> > > > > >>> wrote:
> > >> > > > > >>>
> > >> > > > > >>> > Hello Airflow Community,
> > >> > > > > >>> >
> > >> > > > > >>> > The email calls for a vote to update AIP-21 Changes in
> > >> import
> > >> > > > > >paths
> > >> > > > > >>> > <
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > >> > > > > >>> > >
> > >> > > > > >>> > with
> > >> > > > > >>> > the changes described below. The vote will last till
> > >> Saturday
> > >> > 8th
> > >> > > > > >2am
> > >> > > > > >>> CEST
> > >> > > > > >>> > (72 hours). Committers have a binding vote but everyone
> from
> > >> > the
> > >> > > > > >>> community
> > >> > > > > >>> > is encouraged to cast an advisory vote.
> > >> > > > > >>> >
> > >> > > > > >>> > *Summary*:
> > >> > > > > >>> >
> > >> > > > > >>> > The proposal is to update AIP-21 to move all non-core
> > >> > > > > >>> > operators/hooks/sensor (and related files) to
> sub-packages
> > >> > within
> > >> > > > > >>> airflow
> > >> > > > > >>> > (protocols/software/providers) or (software/providers).
> > >> > > > > >>> > I am also happy to merge protocols+software, so if you
> have
> > >> a
> > >> > > > > >strong
> > >> > > > > >>> > opinion on it - please state it with your vote and we
> can
> > >> > decide
> > >> > > > > >based
> > >> > > > > >>> on
> > >> > > > > >>> > majority.
> > >> > > > > >>> >
> > >> > > > > >>> > Those packages will be separately released
> (schedule/process
> > >> > TBD)
> > >> > > > > >and
> > >> > > > > >>> will
> > >> > > > > >>> > be backportable to 1.10.* airflow series, so that users
> can
> > >> > > > > >install it
> > >> > > > > >>> and
> > >> > > > > >>> > start using new Airflow2.0 operators in their Python 3
> > >> Airflow
> > >> > > > > >1.10
> > >> > > > > >>> > environments (only Python 3.5+ is supported).
> > >> > > > > >>> >
> > >> > > > > >>> > We will proceed with migrating the providers package to
> > >> already
> > >> > > > > >agreed
> > >> > > > > >>> > paths without waiting for the final vote (following
> current
> > >> > > > > >version of
> > >> > > > > >>> > AIP-21). Since we have working POC - we know the agreed
> > >> paths
> > >> > will
> > >> > > > > >work
> > >> > > > > >>> for
> > >> > > > > >>> > us.
> > >> > > > > >>> >
> > >> > > > > >>> > *Previous discussions: *
> > >> > > > > >>> >
> > >> > > > > >>> >    -
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > >> > > > > >>> >    -
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > >> > > > > >>> >
> > >> > > > > >>> > *More Details*:
> > >> > > > > >>> >
> > >> > > > > >>> > 1) Information that we are going in the direction of
> AIP-8
> > >> but
> > >> > not
> > >> > > > > >yet
> > >> > > > > >>> > reaching it - focusing on separating out backportable
> > >> packages
> > >> > > > > >>> installable
> > >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> > >> > installed
> > >> > > > > >as a
> > >> > > > > >>> whole
> > >> > > > > >>> > and all the source will be kept in one repo, but we now
> > >> have a
> > >> > way
> > >> > > > > >to
> > >> > > > > >>> build
> > >> > > > > >>> > backportable packages for groups of operators. POC
> available
> > >> > here:
> > >> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on
> Ash's
> > >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > >> > > > > >>> >
> > >> > > > > >>> > 2) We move all integrations to new packages (keeping
> > >> deprecated
> > >> > > > > >import
> > >> > > > > >>> > aliases in the old places). The following split
> (according
> > >> to
> > >> > > > > >>> "stewardship"
> > >> > > > > >>> > over the integrations):
> > >> > > > > >>> >
> > >> > > > > >>> >    - *fundamentals* - core of ariflow - they are really
> > >> part of
> > >> > > > > >Apache
> > >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > >> > > > > >backportable/separated
> > >> > > > > >>> out.
> > >> > > > > >>> >    - *protocols* - are not owned by anyone, they are
> public
> > >> and
> > >> > > > > >the
> > >> > > > > >>> >    implementation is fully "open". There are no
> particular
> > >> > > > > >stewards (no
> > >> > > > > >>> > need).
> > >> > > > > >>> >    Users of particular protocols should mainly maintain
> > >> those
> > >> > and
> > >> > > > > >add
> > >> > > > > >>> > support
> > >> > > > > >>> >    for different versions of the protocols.
> > >> > > > > >>> >    - *software* - both API and software are controlled
> by
> > >> > someone
> > >> > > > > >>> outside
> > >> > > > > >>> >    of Airflow (commercial or open-source project), but
> the
> > >> > > > > >deployment of
> > >> > > > > >>> > that
> > >> > > > > >>> >    software is "owned" by the user installing Airflow.
> The
> > >> > > > > >"stewardship"
> > >> > > > > >>> > might
> > >> > > > > >>> >    be also the users but the controlling party (Oracle
> for
> > >> > > > > >example)
> > >> > > > > >>> might
> > >> > > > > >>> > be
> > >> > > > > >>> >    interested in maintaining those operators as well.
> > >> > > > > >>> >    - *providers* - API/software/deployments are fully
> > >> > controlled
> > >> > > > > >by a
> > >> > > > > >>> 3rd
> > >> > > > > >>> >    party. Here most likely "provider" will be
> interested in
> > >> > > > > >maintaining
> > >> > > > > >>> the
> > >> > > > > >>> >    operators (and for example like Google - provide
> > >> integration
> > >> > > > > >>> guidelines
> > >> > > > > >>> >    <
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > >> > > > > >>> > >
> > >> > > > > >>> > for
> > >> > > > > >>> >    their hooks/operators/sensors)
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>> > 3) Between-providers transfer operators should be kept
> at
> > >> the
> > >> > > > > >"target"
> > >> > > > > >>> > rather than "source"
> > >> > > > > >>> > For example S3 -> GCS should be in "google" provider,
> but
> > >> > GCS-> S3
> > >> > > > > >>> should
> > >> > > > > >>> > be in "amazon".
> > >> > > > > >>> >
> > >> > > > > >>> > 4) One-side provider transfer operators should be kept
> at
> > >> the
> > >> > > > > >"provider"
> > >> > > > > >>> > regardless if they are target or source.
> > >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in
> "google"
> > >> > > > > >provider.
> > >> > > > > >>> >
> > >> > > > > >>> > 5) If in doubt we will discuss individual cases
> separately.
> > >> > > > > >>> >
> > >> > > > > >>> > J.
> > >> > > > > >>> >
> > >> > > > > >>> > --
> > >> > > > > >>> >
> > >> > > > > >>> > Jarek Potiuk
> > >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
> > >> > Engineer
> > >> > > > > >>> >
> > >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >> --
> > >> > > > > >>
> > >> > > > > >> Jarek Potiuk
> > >> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> > >> Engineer
> > >> > > > > >>
> > >> > > > > >> M: +48 660 796 129 <+48660796129>
> > >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >
> > >> > > > > >--
> > >> > > > > >
> > >> > > > > >Jarek Potiuk
> > >> > > > > >Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > >> > > > > >
> > >> > > > > >M: +48 660 796 129 <+48660796129>
> > >> > > > > >[image: Polidea] <https://www.polidea.com/>
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Jarek Potiuk
> > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > >> > > >
> > >> > > > M: +48 660 796 129 <+48660796129>
> > >> > > > [image: Polidea] <https://www.polidea.com/>
> > >> > > >
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> > >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kamil Breguła <ka...@polidea.com>.
I looked at this list and I'm only worried about two operators.

airflow.contrib.operators.vertica_to_hive
airflow.contrib.operators.s3_to_hive

If we want the operators to be grouped according to destination, then
this operator should be in apache package. It is the members of the
Apache community who will care most about this operator being of high
quality. Apache can be treated equally with other large cloud
providers, such as GCP, AWS. I can imagine that a new Apache product
will appear and it will want to promote the same way as products of
cloud providers are promoted. By creating a large number of
integrations that allow you to copy data to its operating range.
There's another cases - building a strong Apache community. As a
member of the Apache community, we should promote Apache products to
ensure that the development of the community is correct, and therefore
also for integration into our products with other products.

On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Just to select the "packages" for this update. Anyone has objections for
> this structure (details including transfer operators in
>
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
> Mb1GXvGctmesfg2L089QSOk/edit#gid=0?
>
> *Fundamentals (no change)*
>
>
>
> providers
>
>
>
>
> google
>
>
>
>
> cloud
>
>
>
> gsuite
>
>
>
> marketing_platform
>
>
> amazon
>
>
>
>
> aws
>
>
> microsoft
>
>
>
>
> azure
>
>
> apache
>
>
>
>
> cassandra
>
>
>
> druid
>
>
>
> hadoop
>
>
>
> hive
>
>
>
> pig
>
>
>
> pinot
>
>
>
> spark
>
>
>
> sqoop
>
>
> mysql
>
>
>
> jira
>
>
>
> databricks
>
>
>
> datadog
>
>
>
> dingding
>
>
>
> discord
>
>
>
> cloudant
>
>
>
> jenkins
>
>
>
> opsgenie
>
>
>
> qubole
>
>
>
> salesforce
>
>
>
> segment
>
>
>
> slack
>
>
>
> snowflake
>
>
>
> vertica
>
>
>
> zendesk
>
>
>
> celery
>
>
>
> docker
>
>
>
> bash
>
>
>
> kubernetes
>
>
>
> mssql
>
>
>
> mongodb
>
>
>
> mysql
>
>
>
> openfaas
>
>
>
> oracle
>
>
>
> papermill
>
>
>
> postgres
>
>
>
> presto
>
>
>
> python
>
>
>
> redis
>
>
>
> samba
>
>
>
> sqlite
>
>
>
> imap
>
>
>
> ssh
>
>
>
> filesystem
>
>
>
> sftp
>
>
>
> ftp
>
>
>
> http
>
>
>
> grpc
>
>
>
> smtp
>
>
>
> jdbc
>
>
>
> winrm
>
>
>
> On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > Let me then cancel this vote and I will restart it next week.
> >
> > Yeah. It's a bit like re-opening the Pandora's box but now that we know
> > that we can do it, and we are unblocked in moving to google (which is now
> > the biggest move in-progress),  we can spend more time on getting better
> > (and more final) consensus.
> > I decided to go through the list from the docs (once again Kamil - great
> > that you did it) and prepared this spreadsheet showing the structure. I
> > went through ALL the operators and put them in the right place where our
> > current rules place them.
> >
> > After this exercise, I think that makes sense:
> > - put all the stuff except fundamentals in *"providers"* (everything
> > in "providers" will be potentially backportable).
> > - grouping apache projects under *"apache"* - similar to
> > google/amazon/microsoft (different kind of ownership but still it is an
> > ownership)
> > - for the rest I think what we can do is really to put the operators in
> > folders per "service/company" (without sub-packages). That includes
> > sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp] ??).
> > there is no "ownership" there and no reason to group them. That will put
> > "operators/hooks/sensors" at different levels in the directory tree but we
> > already have that for fundamentals and I am not too worried about that. We
> > do not have to have everything at the same level.
> > - I put transfer operators according to the rule where "to" side is more
> > important unless the other side is a public protocol (so sftp -> gcs and
> > gcs -> sftp both go to google/gcp). I did not have any doubt where to put
> > which transfer operator, so this is a good sign:
> >
> >
> > https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
> >
> > Can you please take a look and express your opinions here so that we can
> > have final voting next week (for those who are not yet tired with the
> > discussion ;)).
> >
> > J.
> >
> > On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> >> Yes, that makes sense.
> >>
> >> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <ka...@polidea.com>
> >> wrote:
> >>
> >> > In the case of Hadoop, it is published by Apache, so it can be in the
> >> > apache directory.  This will mimic the grouping presented in the
> >> > documentation.
> >> >
> >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> >> >
> >> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com> wrote:
> >> > >
> >> > > I think we should keep the vote open at least until mid next week to
> >> have
> >> > > more thought and inputs on this one.
> >> > >
> >> > > In general, I am happy with the approach but operators/hooks and
> >> sensors
> >> > > shouldn't be a provider. "hadoop" can be its provider and hdfs can be
> >> a
> >> > > part of it.
> >> > >
> >> > > providers/
> >> > >     google
> >> > >          cloud
> >> > >              operators
> >> > >              hooks
> >> > >              sensors
> >> > >          gsuite
> >> > >              operators
> >> > >              ...
> >> > >     amazon
> >> > >          aws
> >> > >              operators
> >> > >              ...
> >> > >     microsoft
> >> > >          azure
> >> > >              operators
> >> > >              ...
> >> > >     hadoop
> >> > >         hdfs
> >> > >              operators
> >> > >              ...
> >> > >
> >> > > We can also define what is a "provider" so we know what to add in it
> >> in
> >> > the
> >> > > future. SSH/FTP/SFTP belongs to the same family group. Do we want to
> >> have
> >> > > separate providers for each one of them ???
> >> > >
> >> > > Regards,
> >> > > Kaxil
> >> > >
> >> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > I really like to make everything a provider. That's a great idea !
> >> > This way
> >> > > > everything "backportable" will have to be in "providers" package.
> >> > Really
> >> > > > nice and clean separation (and less mess in "airflow"). And we will
> >> not
> >> > > > have to have any artificial grouping (we can still group them at the
> >> > > > documentation level).
> >> > > >
> >> > > > We do not need backport in name. And I think it's more of technical
> >> > detail
> >> > > > on naming the package which we can work out while reviewing PRs and
> >> we
> >> > can
> >> > > > agree final naming of the released packaged on PMC level (PMCs will
> >> > have to
> >> > > > vote on releasing those).
> >> > > >
> >> > > > The thinking is that it's intention is really to be only backported
> >> to
> >> > 1.10
> >> > > > - we are not going (yet) to use the packages in Airflow 2.*. so I
> >> > thought
> >> > > > by naming them backport we can express that intent more clearly.
> >> > > >
> >> > > > So let me clarify the structure of folders we are going to have if
> >> we
> >> > > > follow it (i just added some examples) including the already agreed
> >> > changes
> >> > > > from AIP-21:
> >> > > >
> >> > > > providers/
> >> > > >     google
> >> > > >          cloud
> >> > > >              operators
> >> > > >              hooks
> >> > > >              sensors
> >> > > >          gsuite
> >> > > >              operators
> >> > > >              ...
> >> > > >     amazon
> >> > > >          aws
> >> > > >              operators
> >> > > >              ...
> >> > > >     microsoft
> >> > > >          azure
> >> > > >              operators
> >> > > >              ...
> >> > > >     operators
> >> > > >          sqlite.py
> >> > > >          oracle.py
> >> > > >          docker.py
> >> > > >     hooks
> >> > > >          hdfs.py
> >> > > >          sqlite.py
> >> > > >     sensors
> >> > > >          http.py
> >> > > >          sql.py
> >> > > >
> >> > > >
> >> > > > J.
> >> > > >
> >> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org>
> >> > wrote:
> >> > > >
> >> > > > > Do we need to include `-backport,`? What was the thinking behind
> >> > that?
> >> > > > >
> >> > > > > I think software and protocol should be merged. I would also say
> >> > > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator
> >> for
> >> > > > > instance is what I would prefer
> >> > > > >
> >> > > > > -a
> >> > > > >
> >> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> >> > Jarek.Potiuk@polidea.com>
> >> > > > > wrote:
> >> > > > > >One more day to go. I would love to see some opinions on this
> >> AIP-21
> >> > > > > >update
> >> > > > > >:).
> >> > > > > >
> >> > > > > >Executive summary:
> >> > > > > >
> >> > > > > >* we will be moving a number of integrations to sub-packages of
> >> > > > > >airflow.
> >> > > > > >* they will be backportable to 1.10.*.  There will be
> >> > > > > >'apache-airflow-[package]-backport' pypi installable with python
> >> 3
> >> > that
> >> > > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> >> > > > > >operators.
> >> > > > > >* the current proposal for sub-packages is
> >> > > > > >"protocols/software/providers/"
> >> > > > > >(but if you think merging protocols and software makes sense -
> >> > please
> >> > > > > >express your opinion
> >> > > > > >* we are not moving "fundamental" operators/hooks etc..
> >> > > > > >* Airflow 2.0 is still going to be installed as a single package
> >> > with
> >> > > > > >all
> >> > > > > >operators (so we are not yet implementing AIP-8)
> >> > > > > >
> >> > > > > >J.
> >> > > > > >
> >> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> >> > Jarek.Potiuk@polidea.com>
> >> > > > > >wrote:
> >> > > > > >
> >> > > > > >> I think all this cases are valid but maybe I was not
> >> super-clear.
> >> > > > > >It's
> >> > > > > >> only the transfer operators that we need to decide where to
> >> put -
> >> > not
> >> > > > > >> hooks.
> >> > > > > >> Usually the complexity of communication with particular
> >> storages
> >> > is
> >> > > > > >(or at
> >> > > > > >> least should be) in the Hooks rather than Operators.
> >> > > > > >>
> >> > > > > >> Operators should be just thin wrappers over the logic in the
> >> > hooks.
> >> > > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon,
> >> > GCS
> >> > > > > >Hooks
> >> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> >> > > > > >>
> >> > > > > >> Since we actually have mono-repo - this will be no problem
> >> (and no
> >> > > > > >cross
> >> > > > > >> dependencies problem) to have S3 -> GCS operator  in google and
> >> > use
> >> > > > > >hooks
> >> > > > > >> from both google/amazon.
> >> > > > > >>
> >> > > > > >> I hope this alleviates your concern Daniel ?
> >> > > > > >>
> >> > > > > >> J.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you
> >> would
> >> > > > > >put in
> >> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would
> >> be in
> >> > > > > >google
> >> > > > > >>> sheets operators file?  The complexity, and the shared code,
> >> are
> >> > in
> >> > > > > >the
> >> > > > > >>> gsheet component -- not into the storage destination.
> >> > > > > >>>
> >> > > > > >>>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> >> > > > > ><Ja...@polidea.com>
> >> > > > > >>> wrote:
> >> > > > > >>>
> >> > > > > >>> > Hello Airflow Community,
> >> > > > > >>> >
> >> > > > > >>> > The email calls for a vote to update AIP-21 Changes in
> >> import
> >> > > > > >paths
> >> > > > > >>> > <
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > > >
> >> > > > >
> >> > > >
> >> >
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >> > > > > >>> > >
> >> > > > > >>> > with
> >> > > > > >>> > the changes described below. The vote will last till
> >> Saturday
> >> > 8th
> >> > > > > >2am
> >> > > > > >>> CEST
> >> > > > > >>> > (72 hours). Committers have a binding vote but everyone from
> >> > the
> >> > > > > >>> community
> >> > > > > >>> > is encouraged to cast an advisory vote.
> >> > > > > >>> >
> >> > > > > >>> > *Summary*:
> >> > > > > >>> >
> >> > > > > >>> > The proposal is to update AIP-21 to move all non-core
> >> > > > > >>> > operators/hooks/sensor (and related files) to sub-packages
> >> > within
> >> > > > > >>> airflow
> >> > > > > >>> > (protocols/software/providers) or (software/providers).
> >> > > > > >>> > I am also happy to merge protocols+software, so if you have
> >> a
> >> > > > > >strong
> >> > > > > >>> > opinion on it - please state it with your vote and we can
> >> > decide
> >> > > > > >based
> >> > > > > >>> on
> >> > > > > >>> > majority.
> >> > > > > >>> >
> >> > > > > >>> > Those packages will be separately released (schedule/process
> >> > TBD)
> >> > > > > >and
> >> > > > > >>> will
> >> > > > > >>> > be backportable to 1.10.* airflow series, so that users can
> >> > > > > >install it
> >> > > > > >>> and
> >> > > > > >>> > start using new Airflow2.0 operators in their Python 3
> >> Airflow
> >> > > > > >1.10
> >> > > > > >>> > environments (only Python 3.5+ is supported).
> >> > > > > >>> >
> >> > > > > >>> > We will proceed with migrating the providers package to
> >> already
> >> > > > > >agreed
> >> > > > > >>> > paths without waiting for the final vote (following current
> >> > > > > >version of
> >> > > > > >>> > AIP-21). Since we have working POC - we know the agreed
> >> paths
> >> > will
> >> > > > > >work
> >> > > > > >>> for
> >> > > > > >>> > us.
> >> > > > > >>> >
> >> > > > > >>> > *Previous discussions: *
> >> > > > > >>> >
> >> > > > > >>> >    -
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > > >
> >> > > > >
> >> > > >
> >> >
> >> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> >> > > > > >>> >    -
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > > >
> >> > > > >
> >> > > >
> >> >
> >> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> >> > > > > >>> >
> >> > > > > >>> > *More Details*:
> >> > > > > >>> >
> >> > > > > >>> > 1) Information that we are going in the direction of AIP-8
> >> but
> >> > not
> >> > > > > >yet
> >> > > > > >>> > reaching it - focusing on separating out backportable
> >> packages
> >> > > > > >>> installable
> >> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> >> > installed
> >> > > > > >as a
> >> > > > > >>> whole
> >> > > > > >>> > and all the source will be kept in one repo, but we now
> >> have a
> >> > way
> >> > > > > >to
> >> > > > > >>> build
> >> > > > > >>> > backportable packages for groups of operators. POC available
> >> > here:
> >> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> >> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> >> > > > > >>> >
> >> > > > > >>> > 2) We move all integrations to new packages (keeping
> >> deprecated
> >> > > > > >import
> >> > > > > >>> > aliases in the old places). The following split (according
> >> to
> >> > > > > >>> "stewardship"
> >> > > > > >>> > over the integrations):
> >> > > > > >>> >
> >> > > > > >>> >    - *fundamentals* - core of ariflow - they are really
> >> part of
> >> > > > > >Apache
> >> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> >> > > > > >backportable/separated
> >> > > > > >>> out.
> >> > > > > >>> >    - *protocols* - are not owned by anyone, they are public
> >> and
> >> > > > > >the
> >> > > > > >>> >    implementation is fully "open". There are no particular
> >> > > > > >stewards (no
> >> > > > > >>> > need).
> >> > > > > >>> >    Users of particular protocols should mainly maintain
> >> those
> >> > and
> >> > > > > >add
> >> > > > > >>> > support
> >> > > > > >>> >    for different versions of the protocols.
> >> > > > > >>> >    - *software* - both API and software are controlled by
> >> > someone
> >> > > > > >>> outside
> >> > > > > >>> >    of Airflow (commercial or open-source project), but the
> >> > > > > >deployment of
> >> > > > > >>> > that
> >> > > > > >>> >    software is "owned" by the user installing Airflow. The
> >> > > > > >"stewardship"
> >> > > > > >>> > might
> >> > > > > >>> >    be also the users but the controlling party (Oracle for
> >> > > > > >example)
> >> > > > > >>> might
> >> > > > > >>> > be
> >> > > > > >>> >    interested in maintaining those operators as well.
> >> > > > > >>> >    - *providers* - API/software/deployments are fully
> >> > controlled
> >> > > > > >by a
> >> > > > > >>> 3rd
> >> > > > > >>> >    party. Here most likely "provider" will be interested in
> >> > > > > >maintaining
> >> > > > > >>> the
> >> > > > > >>> >    operators (and for example like Google - provide
> >> integration
> >> > > > > >>> guidelines
> >> > > > > >>> >    <
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > > >
> >> > > > >
> >> > > >
> >> >
> >> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> >> > > > > >>> > >
> >> > > > > >>> > for
> >> > > > > >>> >    their hooks/operators/sensors)
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>> > 3) Between-providers transfer operators should be kept at
> >> the
> >> > > > > >"target"
> >> > > > > >>> > rather than "source"
> >> > > > > >>> > For example S3 -> GCS should be in "google" provider, but
> >> > GCS-> S3
> >> > > > > >>> should
> >> > > > > >>> > be in "amazon".
> >> > > > > >>> >
> >> > > > > >>> > 4) One-side provider transfer operators should be kept at
> >> the
> >> > > > > >"provider"
> >> > > > > >>> > regardless if they are target or source.
> >> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> >> > > > > >provider.
> >> > > > > >>> >
> >> > > > > >>> > 5) If in doubt we will discuss individual cases separately.
> >> > > > > >>> >
> >> > > > > >>> > J.
> >> > > > > >>> >
> >> > > > > >>> > --
> >> > > > > >>> >
> >> > > > > >>> > Jarek Potiuk
> >> > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
> >> > Engineer
> >> > > > > >>> >
> >> > > > > >>> > M: +48 660 796 129 <+48660796129>
> >> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> --
> >> > > > > >>
> >> > > > > >> Jarek Potiuk
> >> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> >> Engineer
> >> > > > > >>
> >> > > > > >> M: +48 660 796 129 <+48660796129>
> >> > > > > >> [image: Polidea] <https://www.polidea.com/>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >
> >> > > > > >--
> >> > > > > >
> >> > > > > >Jarek Potiuk
> >> > > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> > > > > >
> >> > > > > >M: +48 660 796 129 <+48660796129>
> >> > > > > >[image: Polidea] <https://www.polidea.com/>
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Jarek Potiuk
> >> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> > > >
> >> > > > M: +48 660 796 129 <+48660796129>
> >> > > > [image: Polidea] <https://www.polidea.com/>
> >> > > >
> >> >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
Just to select the "packages" for this update. Anyone has objections for
this structure (details including transfer operators in

https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
Mb1GXvGctmesfg2L089QSOk/edit#gid=0?

*Fundamentals (no change)*



providers




google




cloud



gsuite



marketing_platform


amazon




aws


microsoft




azure


apache




cassandra



druid



hadoop



hive



pig



pinot



spark



sqoop


mysql



jira



databricks



datadog



dingding



discord



cloudant



jenkins



opsgenie



qubole



salesforce



segment



slack



snowflake



vertica



zendesk



celery



docker



bash



kubernetes



mssql



mongodb



mysql



openfaas



oracle



papermill



postgres



presto



python



redis



samba



sqlite



imap



ssh



filesystem



sftp



ftp



http



grpc



smtp



jdbc



winrm



On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Let me then cancel this vote and I will restart it next week.
>
> Yeah. It's a bit like re-opening the Pandora's box but now that we know
> that we can do it, and we are unblocked in moving to google (which is now
> the biggest move in-progress),  we can spend more time on getting better
> (and more final) consensus.
> I decided to go through the list from the docs (once again Kamil - great
> that you did it) and prepared this spreadsheet showing the structure. I
> went through ALL the operators and put them in the right place where our
> current rules place them.
>
> After this exercise, I think that makes sense:
> - put all the stuff except fundamentals in *"providers"* (everything
> in "providers" will be potentially backportable).
> - grouping apache projects under *"apache"* - similar to
> google/amazon/microsoft (different kind of ownership but still it is an
> ownership)
> - for the rest I think what we can do is really to put the operators in
> folders per "service/company" (without sub-packages). That includes
> sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp] ??).
> there is no "ownership" there and no reason to group them. That will put
> "operators/hooks/sensors" at different levels in the directory tree but we
> already have that for fundamentals and I am not too worried about that. We
> do not have to have everything at the same level.
> - I put transfer operators according to the rule where "to" side is more
> important unless the other side is a public protocol (so sftp -> gcs and
> gcs -> sftp both go to google/gcp). I did not have any doubt where to put
> which transfer operator, so this is a good sign:
>
>
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
>
> Can you please take a look and express your opinions here so that we can
> have final voting next week (for those who are not yet tired with the
> discussion ;)).
>
> J.
>
> On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com> wrote:
>
>> Yes, that makes sense.
>>
>> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <ka...@polidea.com>
>> wrote:
>>
>> > In the case of Hadoop, it is published by Apache, so it can be in the
>> > apache directory.  This will mimic the grouping presented in the
>> > documentation.
>> >
>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
>> >
>> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com> wrote:
>> > >
>> > > I think we should keep the vote open at least until mid next week to
>> have
>> > > more thought and inputs on this one.
>> > >
>> > > In general, I am happy with the approach but operators/hooks and
>> sensors
>> > > shouldn't be a provider. "hadoop" can be its provider and hdfs can be
>> a
>> > > part of it.
>> > >
>> > > providers/
>> > >     google
>> > >          cloud
>> > >              operators
>> > >              hooks
>> > >              sensors
>> > >          gsuite
>> > >              operators
>> > >              ...
>> > >     amazon
>> > >          aws
>> > >              operators
>> > >              ...
>> > >     microsoft
>> > >          azure
>> > >              operators
>> > >              ...
>> > >     hadoop
>> > >         hdfs
>> > >              operators
>> > >              ...
>> > >
>> > > We can also define what is a "provider" so we know what to add in it
>> in
>> > the
>> > > future. SSH/FTP/SFTP belongs to the same family group. Do we want to
>> have
>> > > separate providers for each one of them ???
>> > >
>> > > Regards,
>> > > Kaxil
>> > >
>> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
>> >
>> > > wrote:
>> > >
>> > > > I really like to make everything a provider. That's a great idea !
>> > This way
>> > > > everything "backportable" will have to be in "providers" package.
>> > Really
>> > > > nice and clean separation (and less mess in "airflow"). And we will
>> not
>> > > > have to have any artificial grouping (we can still group them at the
>> > > > documentation level).
>> > > >
>> > > > We do not need backport in name. And I think it's more of technical
>> > detail
>> > > > on naming the package which we can work out while reviewing PRs and
>> we
>> > can
>> > > > agree final naming of the released packaged on PMC level (PMCs will
>> > have to
>> > > > vote on releasing those).
>> > > >
>> > > > The thinking is that it's intention is really to be only backported
>> to
>> > 1.10
>> > > > - we are not going (yet) to use the packages in Airflow 2.*. so I
>> > thought
>> > > > by naming them backport we can express that intent more clearly.
>> > > >
>> > > > So let me clarify the structure of folders we are going to have if
>> we
>> > > > follow it (i just added some examples) including the already agreed
>> > changes
>> > > > from AIP-21:
>> > > >
>> > > > providers/
>> > > >     google
>> > > >          cloud
>> > > >              operators
>> > > >              hooks
>> > > >              sensors
>> > > >          gsuite
>> > > >              operators
>> > > >              ...
>> > > >     amazon
>> > > >          aws
>> > > >              operators
>> > > >              ...
>> > > >     microsoft
>> > > >          azure
>> > > >              operators
>> > > >              ...
>> > > >     operators
>> > > >          sqlite.py
>> > > >          oracle.py
>> > > >          docker.py
>> > > >     hooks
>> > > >          hdfs.py
>> > > >          sqlite.py
>> > > >     sensors
>> > > >          http.py
>> > > >          sql.py
>> > > >
>> > > >
>> > > > J.
>> > > >
>> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org>
>> > wrote:
>> > > >
>> > > > > Do we need to include `-backport,`? What was the thinking behind
>> > that?
>> > > > >
>> > > > > I think software and protocol should be merged. I would also say
>> > > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator
>> for
>> > > > > instance is what I would prefer
>> > > > >
>> > > > > -a
>> > > > >
>> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
>> > Jarek.Potiuk@polidea.com>
>> > > > > wrote:
>> > > > > >One more day to go. I would love to see some opinions on this
>> AIP-21
>> > > > > >update
>> > > > > >:).
>> > > > > >
>> > > > > >Executive summary:
>> > > > > >
>> > > > > >* we will be moving a number of integrations to sub-packages of
>> > > > > >airflow.
>> > > > > >* they will be backportable to 1.10.*.  There will be
>> > > > > >'apache-airflow-[package]-backport' pypi installable with python
>> 3
>> > that
>> > > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
>> > > > > >operators.
>> > > > > >* the current proposal for sub-packages is
>> > > > > >"protocols/software/providers/"
>> > > > > >(but if you think merging protocols and software makes sense -
>> > please
>> > > > > >express your opinion
>> > > > > >* we are not moving "fundamental" operators/hooks etc..
>> > > > > >* Airflow 2.0 is still going to be installed as a single package
>> > with
>> > > > > >all
>> > > > > >operators (so we are not yet implementing AIP-8)
>> > > > > >
>> > > > > >J.
>> > > > > >
>> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
>> > Jarek.Potiuk@polidea.com>
>> > > > > >wrote:
>> > > > > >
>> > > > > >> I think all this cases are valid but maybe I was not
>> super-clear.
>> > > > > >It's
>> > > > > >> only the transfer operators that we need to decide where to
>> put -
>> > not
>> > > > > >> hooks.
>> > > > > >> Usually the complexity of communication with particular
>> storages
>> > is
>> > > > > >(or at
>> > > > > >> least should be) in the Hooks rather than Operators.
>> > > > > >>
>> > > > > >> Operators should be just thin wrappers over the logic in the
>> > hooks.
>> > > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon,
>> > GCS
>> > > > > >Hooks
>> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
>> > > > > >>
>> > > > > >> Since we actually have mono-repo - this will be no problem
>> (and no
>> > > > > >cross
>> > > > > >> dependencies problem) to have S3 -> GCS operator  in google and
>> > use
>> > > > > >hooks
>> > > > > >> from both google/amazon.
>> > > > > >>
>> > > > > >> I hope this alleviates your concern Daniel ?
>> > > > > >>
>> > > > > >> J.
>> > > > > >>
>> > > > > >>
>> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you
>> would
>> > > > > >put in
>> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would
>> be in
>> > > > > >google
>> > > > > >>> sheets operators file?  The complexity, and the shared code,
>> are
>> > in
>> > > > > >the
>> > > > > >>> gsheet component -- not into the storage destination.
>> > > > > >>>
>> > > > > >>>
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
>> > > > > ><Ja...@polidea.com>
>> > > > > >>> wrote:
>> > > > > >>>
>> > > > > >>> > Hello Airflow Community,
>> > > > > >>> >
>> > > > > >>> > The email calls for a vote to update AIP-21 Changes in
>> import
>> > > > > >paths
>> > > > > >>> > <
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>> > > > > >>> > >
>> > > > > >>> > with
>> > > > > >>> > the changes described below. The vote will last till
>> Saturday
>> > 8th
>> > > > > >2am
>> > > > > >>> CEST
>> > > > > >>> > (72 hours). Committers have a binding vote but everyone from
>> > the
>> > > > > >>> community
>> > > > > >>> > is encouraged to cast an advisory vote.
>> > > > > >>> >
>> > > > > >>> > *Summary*:
>> > > > > >>> >
>> > > > > >>> > The proposal is to update AIP-21 to move all non-core
>> > > > > >>> > operators/hooks/sensor (and related files) to sub-packages
>> > within
>> > > > > >>> airflow
>> > > > > >>> > (protocols/software/providers) or (software/providers).
>> > > > > >>> > I am also happy to merge protocols+software, so if you have
>> a
>> > > > > >strong
>> > > > > >>> > opinion on it - please state it with your vote and we can
>> > decide
>> > > > > >based
>> > > > > >>> on
>> > > > > >>> > majority.
>> > > > > >>> >
>> > > > > >>> > Those packages will be separately released (schedule/process
>> > TBD)
>> > > > > >and
>> > > > > >>> will
>> > > > > >>> > be backportable to 1.10.* airflow series, so that users can
>> > > > > >install it
>> > > > > >>> and
>> > > > > >>> > start using new Airflow2.0 operators in their Python 3
>> Airflow
>> > > > > >1.10
>> > > > > >>> > environments (only Python 3.5+ is supported).
>> > > > > >>> >
>> > > > > >>> > We will proceed with migrating the providers package to
>> already
>> > > > > >agreed
>> > > > > >>> > paths without waiting for the final vote (following current
>> > > > > >version of
>> > > > > >>> > AIP-21). Since we have working POC - we know the agreed
>> paths
>> > will
>> > > > > >work
>> > > > > >>> for
>> > > > > >>> > us.
>> > > > > >>> >
>> > > > > >>> > *Previous discussions: *
>> > > > > >>> >
>> > > > > >>> >    -
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>> > > > > >>> >    -
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>> > > > > >>> >
>> > > > > >>> > *More Details*:
>> > > > > >>> >
>> > > > > >>> > 1) Information that we are going in the direction of AIP-8
>> but
>> > not
>> > > > > >yet
>> > > > > >>> > reaching it - focusing on separating out backportable
>> packages
>> > > > > >>> installable
>> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
>> > installed
>> > > > > >as a
>> > > > > >>> whole
>> > > > > >>> > and all the source will be kept in one repo, but we now
>> have a
>> > way
>> > > > > >to
>> > > > > >>> build
>> > > > > >>> > backportable packages for groups of operators. POC available
>> > here:
>> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
>> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
>> > > > > >>> >
>> > > > > >>> > 2) We move all integrations to new packages (keeping
>> deprecated
>> > > > > >import
>> > > > > >>> > aliases in the old places). The following split (according
>> to
>> > > > > >>> "stewardship"
>> > > > > >>> > over the integrations):
>> > > > > >>> >
>> > > > > >>> >    - *fundamentals* - core of ariflow - they are really
>> part of
>> > > > > >Apache
>> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
>> > > > > >backportable/separated
>> > > > > >>> out.
>> > > > > >>> >    - *protocols* - are not owned by anyone, they are public
>> and
>> > > > > >the
>> > > > > >>> >    implementation is fully "open". There are no particular
>> > > > > >stewards (no
>> > > > > >>> > need).
>> > > > > >>> >    Users of particular protocols should mainly maintain
>> those
>> > and
>> > > > > >add
>> > > > > >>> > support
>> > > > > >>> >    for different versions of the protocols.
>> > > > > >>> >    - *software* - both API and software are controlled by
>> > someone
>> > > > > >>> outside
>> > > > > >>> >    of Airflow (commercial or open-source project), but the
>> > > > > >deployment of
>> > > > > >>> > that
>> > > > > >>> >    software is "owned" by the user installing Airflow. The
>> > > > > >"stewardship"
>> > > > > >>> > might
>> > > > > >>> >    be also the users but the controlling party (Oracle for
>> > > > > >example)
>> > > > > >>> might
>> > > > > >>> > be
>> > > > > >>> >    interested in maintaining those operators as well.
>> > > > > >>> >    - *providers* - API/software/deployments are fully
>> > controlled
>> > > > > >by a
>> > > > > >>> 3rd
>> > > > > >>> >    party. Here most likely "provider" will be interested in
>> > > > > >maintaining
>> > > > > >>> the
>> > > > > >>> >    operators (and for example like Google - provide
>> integration
>> > > > > >>> guidelines
>> > > > > >>> >    <
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
>> > > > > >>> > >
>> > > > > >>> > for
>> > > > > >>> >    their hooks/operators/sensors)
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>> > 3) Between-providers transfer operators should be kept at
>> the
>> > > > > >"target"
>> > > > > >>> > rather than "source"
>> > > > > >>> > For example S3 -> GCS should be in "google" provider, but
>> > GCS-> S3
>> > > > > >>> should
>> > > > > >>> > be in "amazon".
>> > > > > >>> >
>> > > > > >>> > 4) One-side provider transfer operators should be kept at
>> the
>> > > > > >"provider"
>> > > > > >>> > regardless if they are target or source.
>> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
>> > > > > >provider.
>> > > > > >>> >
>> > > > > >>> > 5) If in doubt we will discuss individual cases separately.
>> > > > > >>> >
>> > > > > >>> > J.
>> > > > > >>> >
>> > > > > >>> > --
>> > > > > >>> >
>> > > > > >>> > Jarek Potiuk
>> > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
>> > Engineer
>> > > > > >>> >
>> > > > > >>> > M: +48 660 796 129 <+48660796129>
>> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
>> > > > > >>> >
>> > > > > >>>
>> > > > > >>
>> > > > > >>
>> > > > > >> --
>> > > > > >>
>> > > > > >> Jarek Potiuk
>> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > > > > >>
>> > > > > >> M: +48 660 796 129 <+48660796129>
>> > > > > >> [image: Polidea] <https://www.polidea.com/>
>> > > > > >>
>> > > > > >>
>> > > > > >
>> > > > > >--
>> > > > > >
>> > > > > >Jarek Potiuk
>> > > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > > > >
>> > > > > >M: +48 660 796 129 <+48660796129>
>> > > > > >[image: Polidea] <https://www.polidea.com/>
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jarek Potiuk
>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > >
>> > > > M: +48 660 796 129 <+48660796129>
>> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
Let me then cancel this vote and I will restart it next week.

Yeah. It's a bit like re-opening the Pandora's box but now that we know
that we can do it, and we are unblocked in moving to google (which is now
the biggest move in-progress),  we can spend more time on getting better
(and more final) consensus.
I decided to go through the list from the docs (once again Kamil - great
that you did it) and prepared this spreadsheet showing the structure. I
went through ALL the operators and put them in the right place where our
current rules place them.

After this exercise, I think that makes sense:
- put all the stuff except fundamentals in *"providers"* (everything
in "providers" will be potentially backportable).
- grouping apache projects under *"apache"* - similar to
google/amazon/microsoft (different kind of ownership but still it is an
ownership)
- for the rest I think what we can do is really to put the operators in
folders per "service/company" (without sub-packages). That includes
sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp] ??).
there is no "ownership" there and no reason to group them. That will put
"operators/hooks/sensors" at different levels in the directory tree but we
already have that for fundamentals and I am not too worried about that. We
do not have to have everything at the same level.
- I put transfer operators according to the rule where "to" side is more
important unless the other side is a public protocol (so sftp -> gcs and
gcs -> sftp both go to google/gcp). I did not have any doubt where to put
which transfer operator, so this is a good sign:

https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0

Can you please take a look and express your opinions here so that we can
have final voting next week (for those who are not yet tired with the
discussion ;)).

J.

On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <ka...@gmail.com> wrote:

> Yes, that makes sense.
>
> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
> > In the case of Hadoop, it is published by Apache, so it can be in the
> > apache directory.  This will mimic the grouping presented in the
> > documentation.
> >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
> >
> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com> wrote:
> > >
> > > I think we should keep the vote open at least until mid next week to
> have
> > > more thought and inputs on this one.
> > >
> > > In general, I am happy with the approach but operators/hooks and
> sensors
> > > shouldn't be a provider. "hadoop" can be its provider and hdfs can be a
> > > part of it.
> > >
> > > providers/
> > >     google
> > >          cloud
> > >              operators
> > >              hooks
> > >              sensors
> > >          gsuite
> > >              operators
> > >              ...
> > >     amazon
> > >          aws
> > >              operators
> > >              ...
> > >     microsoft
> > >          azure
> > >              operators
> > >              ...
> > >     hadoop
> > >         hdfs
> > >              operators
> > >              ...
> > >
> > > We can also define what is a "provider" so we know what to add in it in
> > the
> > > future. SSH/FTP/SFTP belongs to the same family group. Do we want to
> have
> > > separate providers for each one of them ???
> > >
> > > Regards,
> > > Kaxil
> > >
> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > >
> > > > I really like to make everything a provider. That's a great idea !
> > This way
> > > > everything "backportable" will have to be in "providers" package.
> > Really
> > > > nice and clean separation (and less mess in "airflow"). And we will
> not
> > > > have to have any artificial grouping (we can still group them at the
> > > > documentation level).
> > > >
> > > > We do not need backport in name. And I think it's more of technical
> > detail
> > > > on naming the package which we can work out while reviewing PRs and
> we
> > can
> > > > agree final naming of the released packaged on PMC level (PMCs will
> > have to
> > > > vote on releasing those).
> > > >
> > > > The thinking is that it's intention is really to be only backported
> to
> > 1.10
> > > > - we are not going (yet) to use the packages in Airflow 2.*. so I
> > thought
> > > > by naming them backport we can express that intent more clearly.
> > > >
> > > > So let me clarify the structure of folders we are going to have if we
> > > > follow it (i just added some examples) including the already agreed
> > changes
> > > > from AIP-21:
> > > >
> > > > providers/
> > > >     google
> > > >          cloud
> > > >              operators
> > > >              hooks
> > > >              sensors
> > > >          gsuite
> > > >              operators
> > > >              ...
> > > >     amazon
> > > >          aws
> > > >              operators
> > > >              ...
> > > >     microsoft
> > > >          azure
> > > >              operators
> > > >              ...
> > > >     operators
> > > >          sqlite.py
> > > >          oracle.py
> > > >          docker.py
> > > >     hooks
> > > >          hdfs.py
> > > >          sqlite.py
> > > >     sensors
> > > >          http.py
> > > >          sql.py
> > > >
> > > >
> > > > J.
> > > >
> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org>
> > wrote:
> > > >
> > > > > Do we need to include `-backport,`? What was the thinking behind
> > that?
> > > > >
> > > > > I think software and protocol should be merged. I would also say
> > > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator
> for
> > > > > instance is what I would prefer
> > > > >
> > > > > -a
> > > > >
> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > > wrote:
> > > > > >One more day to go. I would love to see some opinions on this
> AIP-21
> > > > > >update
> > > > > >:).
> > > > > >
> > > > > >Executive summary:
> > > > > >
> > > > > >* we will be moving a number of integrations to sub-packages of
> > > > > >airflow.
> > > > > >* they will be backportable to 1.10.*.  There will be
> > > > > >'apache-airflow-[package]-backport' pypi installable with python 3
> > that
> > > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> > > > > >operators.
> > > > > >* the current proposal for sub-packages is
> > > > > >"protocols/software/providers/"
> > > > > >(but if you think merging protocols and software makes sense -
> > please
> > > > > >express your opinion
> > > > > >* we are not moving "fundamental" operators/hooks etc..
> > > > > >* Airflow 2.0 is still going to be installed as a single package
> > with
> > > > > >all
> > > > > >operators (so we are not yet implementing AIP-8)
> > > > > >
> > > > > >J.
> > > > > >
> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > > >wrote:
> > > > > >
> > > > > >> I think all this cases are valid but maybe I was not
> super-clear.
> > > > > >It's
> > > > > >> only the transfer operators that we need to decide where to put
> -
> > not
> > > > > >> hooks.
> > > > > >> Usually the complexity of communication with particular storages
> > is
> > > > > >(or at
> > > > > >> least should be) in the Hooks rather than Operators.
> > > > > >>
> > > > > >> Operators should be just thin wrappers over the logic in the
> > hooks.
> > > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon,
> > GCS
> > > > > >Hooks
> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > > > >>
> > > > > >> Since we actually have mono-repo - this will be no problem (and
> no
> > > > > >cross
> > > > > >> dependencies problem) to have S3 -> GCS operator  in google and
> > use
> > > > > >hooks
> > > > > >> from both google/amazon.
> > > > > >>
> > > > > >> I hope this alleviates your concern Daniel ?
> > > > > >>
> > > > > >> J.
> > > > > >>
> > > > > >>
> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you
> would
> > > > > >put in
> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would be
> in
> > > > > >google
> > > > > >>> sheets operators file?  The complexity, and the shared code,
> are
> > in
> > > > > >the
> > > > > >>> gsheet component -- not into the storage destination.
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > > ><Ja...@polidea.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > Hello Airflow Community,
> > > > > >>> >
> > > > > >>> > The email calls for a vote to update AIP-21 Changes in import
> > > > > >paths
> > > > > >>> > <
> > > > > >>> >
> > > > > >>>
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > > >>> > >
> > > > > >>> > with
> > > > > >>> > the changes described below. The vote will last till Saturday
> > 8th
> > > > > >2am
> > > > > >>> CEST
> > > > > >>> > (72 hours). Committers have a binding vote but everyone from
> > the
> > > > > >>> community
> > > > > >>> > is encouraged to cast an advisory vote.
> > > > > >>> >
> > > > > >>> > *Summary*:
> > > > > >>> >
> > > > > >>> > The proposal is to update AIP-21 to move all non-core
> > > > > >>> > operators/hooks/sensor (and related files) to sub-packages
> > within
> > > > > >>> airflow
> > > > > >>> > (protocols/software/providers) or (software/providers).
> > > > > >>> > I am also happy to merge protocols+software, so if you have a
> > > > > >strong
> > > > > >>> > opinion on it - please state it with your vote and we can
> > decide
> > > > > >based
> > > > > >>> on
> > > > > >>> > majority.
> > > > > >>> >
> > > > > >>> > Those packages will be separately released (schedule/process
> > TBD)
> > > > > >and
> > > > > >>> will
> > > > > >>> > be backportable to 1.10.* airflow series, so that users can
> > > > > >install it
> > > > > >>> and
> > > > > >>> > start using new Airflow2.0 operators in their Python 3
> Airflow
> > > > > >1.10
> > > > > >>> > environments (only Python 3.5+ is supported).
> > > > > >>> >
> > > > > >>> > We will proceed with migrating the providers package to
> already
> > > > > >agreed
> > > > > >>> > paths without waiting for the final vote (following current
> > > > > >version of
> > > > > >>> > AIP-21). Since we have working POC - we know the agreed paths
> > will
> > > > > >work
> > > > > >>> for
> > > > > >>> > us.
> > > > > >>> >
> > > > > >>> > *Previous discussions: *
> > > > > >>> >
> > > > > >>> >    -
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > > >>> >    -
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > > >
> > > > >
> > > >
> >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > > >>> >
> > > > > >>> > *More Details*:
> > > > > >>> >
> > > > > >>> > 1) Information that we are going in the direction of AIP-8
> but
> > not
> > > > > >yet
> > > > > >>> > reaching it - focusing on separating out backportable
> packages
> > > > > >>> installable
> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> > installed
> > > > > >as a
> > > > > >>> whole
> > > > > >>> > and all the source will be kept in one repo, but we now have
> a
> > way
> > > > > >to
> > > > > >>> build
> > > > > >>> > backportable packages for groups of operators. POC available
> > here:
> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > > >>> >
> > > > > >>> > 2) We move all integrations to new packages (keeping
> deprecated
> > > > > >import
> > > > > >>> > aliases in the old places). The following split (according to
> > > > > >>> "stewardship"
> > > > > >>> > over the integrations):
> > > > > >>> >
> > > > > >>> >    - *fundamentals* - core of ariflow - they are really part
> of
> > > > > >Apache
> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > > >backportable/separated
> > > > > >>> out.
> > > > > >>> >    - *protocols* - are not owned by anyone, they are public
> and
> > > > > >the
> > > > > >>> >    implementation is fully "open". There are no particular
> > > > > >stewards (no
> > > > > >>> > need).
> > > > > >>> >    Users of particular protocols should mainly maintain those
> > and
> > > > > >add
> > > > > >>> > support
> > > > > >>> >    for different versions of the protocols.
> > > > > >>> >    - *software* - both API and software are controlled by
> > someone
> > > > > >>> outside
> > > > > >>> >    of Airflow (commercial or open-source project), but the
> > > > > >deployment of
> > > > > >>> > that
> > > > > >>> >    software is "owned" by the user installing Airflow. The
> > > > > >"stewardship"
> > > > > >>> > might
> > > > > >>> >    be also the users but the controlling party (Oracle for
> > > > > >example)
> > > > > >>> might
> > > > > >>> > be
> > > > > >>> >    interested in maintaining those operators as well.
> > > > > >>> >    - *providers* - API/software/deployments are fully
> > controlled
> > > > > >by a
> > > > > >>> 3rd
> > > > > >>> >    party. Here most likely "provider" will be interested in
> > > > > >maintaining
> > > > > >>> the
> > > > > >>> >    operators (and for example like Google - provide
> integration
> > > > > >>> guidelines
> > > > > >>> >    <
> > > > > >>> >
> > > > > >>>
> > > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > > >>> > >
> > > > > >>> > for
> > > > > >>> >    their hooks/operators/sensors)
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > 3) Between-providers transfer operators should be kept at the
> > > > > >"target"
> > > > > >>> > rather than "source"
> > > > > >>> > For example S3 -> GCS should be in "google" provider, but
> > GCS-> S3
> > > > > >>> should
> > > > > >>> > be in "amazon".
> > > > > >>> >
> > > > > >>> > 4) One-side provider transfer operators should be kept at the
> > > > > >"provider"
> > > > > >>> > regardless if they are target or source.
> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> > > > > >provider.
> > > > > >>> >
> > > > > >>> > 5) If in doubt we will discuss individual cases separately.
> > > > > >>> >
> > > > > >>> > J.
> > > > > >>> >
> > > > > >>> > --
> > > > > >>> >
> > > > > >>> > Jarek Potiuk
> > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > > > >>> >
> > > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> Jarek Potiuk
> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > > >>
> > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >--
> > > > > >
> > > > > >Jarek Potiuk
> > > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > >M: +48 660 796 129 <+48660796129>
> > > > > >[image: Polidea] <https://www.polidea.com/>
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kaxil Naik <ka...@gmail.com>.
Yes, that makes sense.

On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <ka...@polidea.com>
wrote:

> In the case of Hadoop, it is published by Apache, so it can be in the
> apache directory.  This will mimic the grouping presented in the
> documentation.
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
>
> On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > I think we should keep the vote open at least until mid next week to have
> > more thought and inputs on this one.
> >
> > In general, I am happy with the approach but operators/hooks and sensors
> > shouldn't be a provider. "hadoop" can be its provider and hdfs can be a
> > part of it.
> >
> > providers/
> >     google
> >          cloud
> >              operators
> >              hooks
> >              sensors
> >          gsuite
> >              operators
> >              ...
> >     amazon
> >          aws
> >              operators
> >              ...
> >     microsoft
> >          azure
> >              operators
> >              ...
> >     hadoop
> >         hdfs
> >              operators
> >              ...
> >
> > We can also define what is a "provider" so we know what to add in it in
> the
> > future. SSH/FTP/SFTP belongs to the same family group. Do we want to have
> > separate providers for each one of them ???
> >
> > Regards,
> > Kaxil
> >
> > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > I really like to make everything a provider. That's a great idea !
> This way
> > > everything "backportable" will have to be in "providers" package.
> Really
> > > nice and clean separation (and less mess in "airflow"). And we will not
> > > have to have any artificial grouping (we can still group them at the
> > > documentation level).
> > >
> > > We do not need backport in name. And I think it's more of technical
> detail
> > > on naming the package which we can work out while reviewing PRs and we
> can
> > > agree final naming of the released packaged on PMC level (PMCs will
> have to
> > > vote on releasing those).
> > >
> > > The thinking is that it's intention is really to be only backported to
> 1.10
> > > - we are not going (yet) to use the packages in Airflow 2.*. so I
> thought
> > > by naming them backport we can express that intent more clearly.
> > >
> > > So let me clarify the structure of folders we are going to have if we
> > > follow it (i just added some examples) including the already agreed
> changes
> > > from AIP-21:
> > >
> > > providers/
> > >     google
> > >          cloud
> > >              operators
> > >              hooks
> > >              sensors
> > >          gsuite
> > >              operators
> > >              ...
> > >     amazon
> > >          aws
> > >              operators
> > >              ...
> > >     microsoft
> > >          azure
> > >              operators
> > >              ...
> > >     operators
> > >          sqlite.py
> > >          oracle.py
> > >          docker.py
> > >     hooks
> > >          hdfs.py
> > >          sqlite.py
> > >     sensors
> > >          http.py
> > >          sql.py
> > >
> > >
> > > J.
> > >
> > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org>
> wrote:
> > >
> > > > Do we need to include `-backport,`? What was the thinking behind
> that?
> > > >
> > > > I think software and protocol should be merged. I would also say
> > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator for
> > > > instance is what I would prefer
> > > >
> > > > -a
> > > >
> > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > > >One more day to go. I would love to see some opinions on this AIP-21
> > > > >update
> > > > >:).
> > > > >
> > > > >Executive summary:
> > > > >
> > > > >* we will be moving a number of integrations to sub-packages of
> > > > >airflow.
> > > > >* they will be backportable to 1.10.*.  There will be
> > > > >'apache-airflow-[package]-backport' pypi installable with python 3
> that
> > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> > > > >operators.
> > > > >* the current proposal for sub-packages is
> > > > >"protocols/software/providers/"
> > > > >(but if you think merging protocols and software makes sense -
> please
> > > > >express your opinion
> > > > >* we are not moving "fundamental" operators/hooks etc..
> > > > >* Airflow 2.0 is still going to be installed as a single package
> with
> > > > >all
> > > > >operators (so we are not yet implementing AIP-8)
> > > > >
> > > > >J.
> > > > >
> > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > > >wrote:
> > > > >
> > > > >> I think all this cases are valid but maybe I was not super-clear.
> > > > >It's
> > > > >> only the transfer operators that we need to decide where to put -
> not
> > > > >> hooks.
> > > > >> Usually the complexity of communication with particular storages
> is
> > > > >(or at
> > > > >> least should be) in the Hooks rather than Operators.
> > > > >>
> > > > >> Operators should be just thin wrappers over the logic in the
> hooks.
> > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon,
> GCS
> > > > >Hooks
> > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > > >>
> > > > >> Since we actually have mono-repo - this will be no problem (and no
> > > > >cross
> > > > >> dependencies problem) to have S3 -> GCS operator  in google and
> use
> > > > >hooks
> > > > >> from both google/amazon.
> > > > >>
> > > > >> I hope this alleviates your concern Daniel ?
> > > > >>
> > > > >> J.
> > > > >>
> > > > >>
> > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would
> > > > >put in
> > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in
> > > > >google
> > > > >>> sheets operators file?  The complexity, and the shared code, are
> in
> > > > >the
> > > > >>> gsheet component -- not into the storage destination.
> > > > >>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > > ><Ja...@polidea.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hello Airflow Community,
> > > > >>> >
> > > > >>> > The email calls for a vote to update AIP-21 Changes in import
> > > > >paths
> > > > >>> > <
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > >>> > >
> > > > >>> > with
> > > > >>> > the changes described below. The vote will last till Saturday
> 8th
> > > > >2am
> > > > >>> CEST
> > > > >>> > (72 hours). Committers have a binding vote but everyone from
> the
> > > > >>> community
> > > > >>> > is encouraged to cast an advisory vote.
> > > > >>> >
> > > > >>> > *Summary*:
> > > > >>> >
> > > > >>> > The proposal is to update AIP-21 to move all non-core
> > > > >>> > operators/hooks/sensor (and related files) to sub-packages
> within
> > > > >>> airflow
> > > > >>> > (protocols/software/providers) or (software/providers).
> > > > >>> > I am also happy to merge protocols+software, so if you have a
> > > > >strong
> > > > >>> > opinion on it - please state it with your vote and we can
> decide
> > > > >based
> > > > >>> on
> > > > >>> > majority.
> > > > >>> >
> > > > >>> > Those packages will be separately released (schedule/process
> TBD)
> > > > >and
> > > > >>> will
> > > > >>> > be backportable to 1.10.* airflow series, so that users can
> > > > >install it
> > > > >>> and
> > > > >>> > start using new Airflow2.0 operators in their Python 3 Airflow
> > > > >1.10
> > > > >>> > environments (only Python 3.5+ is supported).
> > > > >>> >
> > > > >>> > We will proceed with migrating the providers package to already
> > > > >agreed
> > > > >>> > paths without waiting for the final vote (following current
> > > > >version of
> > > > >>> > AIP-21). Since we have working POC - we know the agreed paths
> will
> > > > >work
> > > > >>> for
> > > > >>> > us.
> > > > >>> >
> > > > >>> > *Previous discussions: *
> > > > >>> >
> > > > >>> >    -
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > > >>> >    -
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > > >>> >
> > > > >>> > *More Details*:
> > > > >>> >
> > > > >>> > 1) Information that we are going in the direction of AIP-8 but
> not
> > > > >yet
> > > > >>> > reaching it - focusing on separating out backportable packages
> > > > >>> installable
> > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
> installed
> > > > >as a
> > > > >>> whole
> > > > >>> > and all the source will be kept in one repo, but we now have a
> way
> > > > >to
> > > > >>> build
> > > > >>> > backportable packages for groups of operators. POC available
> here:
> > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> > > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > > >>> >
> > > > >>> > 2) We move all integrations to new packages (keeping deprecated
> > > > >import
> > > > >>> > aliases in the old places). The following split (according to
> > > > >>> "stewardship"
> > > > >>> > over the integrations):
> > > > >>> >
> > > > >>> >    - *fundamentals* - core of ariflow - they are really part of
> > > > >Apache
> > > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > > >backportable/separated
> > > > >>> out.
> > > > >>> >    - *protocols* - are not owned by anyone, they are public and
> > > > >the
> > > > >>> >    implementation is fully "open". There are no particular
> > > > >stewards (no
> > > > >>> > need).
> > > > >>> >    Users of particular protocols should mainly maintain those
> and
> > > > >add
> > > > >>> > support
> > > > >>> >    for different versions of the protocols.
> > > > >>> >    - *software* - both API and software are controlled by
> someone
> > > > >>> outside
> > > > >>> >    of Airflow (commercial or open-source project), but the
> > > > >deployment of
> > > > >>> > that
> > > > >>> >    software is "owned" by the user installing Airflow. The
> > > > >"stewardship"
> > > > >>> > might
> > > > >>> >    be also the users but the controlling party (Oracle for
> > > > >example)
> > > > >>> might
> > > > >>> > be
> > > > >>> >    interested in maintaining those operators as well.
> > > > >>> >    - *providers* - API/software/deployments are fully
> controlled
> > > > >by a
> > > > >>> 3rd
> > > > >>> >    party. Here most likely "provider" will be interested in
> > > > >maintaining
> > > > >>> the
> > > > >>> >    operators (and for example like Google - provide integration
> > > > >>> guidelines
> > > > >>> >    <
> > > > >>> >
> > > > >>>
> > > > >
> > > >
> > >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > > >>> > >
> > > > >>> > for
> > > > >>> >    their hooks/operators/sensors)
> > > > >>> >
> > > > >>> >
> > > > >>> > 3) Between-providers transfer operators should be kept at the
> > > > >"target"
> > > > >>> > rather than "source"
> > > > >>> > For example S3 -> GCS should be in "google" provider, but
> GCS-> S3
> > > > >>> should
> > > > >>> > be in "amazon".
> > > > >>> >
> > > > >>> > 4) One-side provider transfer operators should be kept at the
> > > > >"provider"
> > > > >>> > regardless if they are target or source.
> > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> > > > >provider.
> > > > >>> >
> > > > >>> > 5) If in doubt we will discuss individual cases separately.
> > > > >>> >
> > > > >>> > J.
> > > > >>> >
> > > > >>> > --
> > > > >>> >
> > > > >>> > Jarek Potiuk
> > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > > >>> >
> > > > >>> > M: +48 660 796 129 <+48660796129>
> > > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Jarek Potiuk
> > > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >>
> > > > >> M: +48 660 796 129 <+48660796129>
> > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > >>
> > > > >>
> > > > >
> > > > >--
> > > > >
> > > > >Jarek Potiuk
> > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >
> > > > >M: +48 660 796 129 <+48660796129>
> > > > >[image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kamil Breguła <ka...@polidea.com>.
In the case of Hadoop, it is published by Apache, so it can be in the
apache directory.  This will mimic the grouping presented in the
documentation. https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks

On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> I think we should keep the vote open at least until mid next week to have
> more thought and inputs on this one.
>
> In general, I am happy with the approach but operators/hooks and sensors
> shouldn't be a provider. "hadoop" can be its provider and hdfs can be a
> part of it.
>
> providers/
>     google
>          cloud
>              operators
>              hooks
>              sensors
>          gsuite
>              operators
>              ...
>     amazon
>          aws
>              operators
>              ...
>     microsoft
>          azure
>              operators
>              ...
>     hadoop
>         hdfs
>              operators
>              ...
>
> We can also define what is a "provider" so we know what to add in it in the
> future. SSH/FTP/SFTP belongs to the same family group. Do we want to have
> separate providers for each one of them ???
>
> Regards,
> Kaxil
>
> On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > I really like to make everything a provider. That's a great idea ! This way
> > everything "backportable" will have to be in "providers" package. Really
> > nice and clean separation (and less mess in "airflow"). And we will not
> > have to have any artificial grouping (we can still group them at the
> > documentation level).
> >
> > We do not need backport in name. And I think it's more of technical detail
> > on naming the package which we can work out while reviewing PRs and we can
> > agree final naming of the released packaged on PMC level (PMCs will have to
> > vote on releasing those).
> >
> > The thinking is that it's intention is really to be only backported to 1.10
> > - we are not going (yet) to use the packages in Airflow 2.*. so I thought
> > by naming them backport we can express that intent more clearly.
> >
> > So let me clarify the structure of folders we are going to have if we
> > follow it (i just added some examples) including the already agreed changes
> > from AIP-21:
> >
> > providers/
> >     google
> >          cloud
> >              operators
> >              hooks
> >              sensors
> >          gsuite
> >              operators
> >              ...
> >     amazon
> >          aws
> >              operators
> >              ...
> >     microsoft
> >          azure
> >              operators
> >              ...
> >     operators
> >          sqlite.py
> >          oracle.py
> >          docker.py
> >     hooks
> >          hdfs.py
> >          sqlite.py
> >     sensors
> >          http.py
> >          sql.py
> >
> >
> > J.
> >
> > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org> wrote:
> >
> > > Do we need to include `-backport,`? What was the thinking behind that?
> > >
> > > I think software and protocol should be merged. I would also say
> > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator for
> > > instance is what I would prefer
> > >
> > > -a
> > >
> > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <Ja...@polidea.com>
> > > wrote:
> > > >One more day to go. I would love to see some opinions on this AIP-21
> > > >update
> > > >:).
> > > >
> > > >Executive summary:
> > > >
> > > >* we will be moving a number of integrations to sub-packages of
> > > >airflow.
> > > >* they will be backportable to 1.10.*.  There will be
> > > >'apache-airflow-[package]-backport' pypi installable with python 3 that
> > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> > > >operators.
> > > >* the current proposal for sub-packages is
> > > >"protocols/software/providers/"
> > > >(but if you think merging protocols and software makes sense - please
> > > >express your opinion
> > > >* we are not moving "fundamental" operators/hooks etc..
> > > >* Airflow 2.0 is still going to be installed as a single package with
> > > >all
> > > >operators (so we are not yet implementing AIP-8)
> > > >
> > > >J.
> > > >
> > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
> > > >wrote:
> > > >
> > > >> I think all this cases are valid but maybe I was not super-clear.
> > > >It's
> > > >> only the transfer operators that we need to decide where to put - not
> > > >> hooks.
> > > >> Usually the complexity of communication with particular storages is
> > > >(or at
> > > >> least should be) in the Hooks rather than Operators.
> > > >>
> > > >> Operators should be just thin wrappers over the logic in the hooks.
> > > >> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS
> > > >Hooks
> > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > > >>
> > > >> Since we actually have mono-repo - this will be no problem (and no
> > > >cross
> > > >> dependencies problem) to have S3 -> GCS operator  in google and use
> > > >hooks
> > > >> from both google/amazon.
> > > >>
> > > >> I hope this alleviates your concern Daniel ?
> > > >>
> > > >> J.
> > > >>
> > > >>
> > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would
> > > >put in
> > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in
> > > >google
> > > >>> sheets operators file?  The complexity, and the shared code, are in
> > > >the
> > > >>> gsheet component -- not into the storage destination.
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > > ><Ja...@polidea.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hello Airflow Community,
> > > >>> >
> > > >>> > The email calls for a vote to update AIP-21 Changes in import
> > > >paths
> > > >>> > <
> > > >>> >
> > > >>>
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > >>> > >
> > > >>> > with
> > > >>> > the changes described below. The vote will last till Saturday 8th
> > > >2am
> > > >>> CEST
> > > >>> > (72 hours). Committers have a binding vote but everyone from the
> > > >>> community
> > > >>> > is encouraged to cast an advisory vote.
> > > >>> >
> > > >>> > *Summary*:
> > > >>> >
> > > >>> > The proposal is to update AIP-21 to move all non-core
> > > >>> > operators/hooks/sensor (and related files) to sub-packages within
> > > >>> airflow
> > > >>> > (protocols/software/providers) or (software/providers).
> > > >>> > I am also happy to merge protocols+software, so if you have a
> > > >strong
> > > >>> > opinion on it - please state it with your vote and we can decide
> > > >based
> > > >>> on
> > > >>> > majority.
> > > >>> >
> > > >>> > Those packages will be separately released (schedule/process TBD)
> > > >and
> > > >>> will
> > > >>> > be backportable to 1.10.* airflow series, so that users can
> > > >install it
> > > >>> and
> > > >>> > start using new Airflow2.0 operators in their Python 3 Airflow
> > > >1.10
> > > >>> > environments (only Python 3.5+ is supported).
> > > >>> >
> > > >>> > We will proceed with migrating the providers package to already
> > > >agreed
> > > >>> > paths without waiting for the final vote (following current
> > > >version of
> > > >>> > AIP-21). Since we have working POC - we know the agreed paths will
> > > >work
> > > >>> for
> > > >>> > us.
> > > >>> >
> > > >>> > *Previous discussions: *
> > > >>> >
> > > >>> >    -
> > > >>> >
> > > >>> >
> > > >>>
> > > >
> > >
> > https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > > >>> >    -
> > > >>> >
> > > >>> >
> > > >>>
> > > >
> > >
> > https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > > >>> >
> > > >>> > *More Details*:
> > > >>> >
> > > >>> > 1) Information that we are going in the direction of AIP-8 but not
> > > >yet
> > > >>> > reaching it - focusing on separating out backportable packages
> > > >>> installable
> > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed
> > > >as a
> > > >>> whole
> > > >>> > and all the source will be kept in one repo, but we now have a way
> > > >to
> > > >>> build
> > > >>> > backportable packages for groups of operators. POC available here:
> > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> > > >>> > https://github.com/ashb/airflow-submodule-test)
> > > >>> >
> > > >>> > 2) We move all integrations to new packages (keeping deprecated
> > > >import
> > > >>> > aliases in the old places). The following split (according to
> > > >>> "stewardship"
> > > >>> > over the integrations):
> > > >>> >
> > > >>> >    - *fundamentals* - core of ariflow - they are really part of
> > > >Apache
> > > >>> >    Airflow. Stewards - core Airflow team. Not
> > > >backportable/separated
> > > >>> out.
> > > >>> >    - *protocols* - are not owned by anyone, they are public and
> > > >the
> > > >>> >    implementation is fully "open". There are no particular
> > > >stewards (no
> > > >>> > need).
> > > >>> >    Users of particular protocols should mainly maintain those and
> > > >add
> > > >>> > support
> > > >>> >    for different versions of the protocols.
> > > >>> >    - *software* - both API and software are controlled by someone
> > > >>> outside
> > > >>> >    of Airflow (commercial or open-source project), but the
> > > >deployment of
> > > >>> > that
> > > >>> >    software is "owned" by the user installing Airflow. The
> > > >"stewardship"
> > > >>> > might
> > > >>> >    be also the users but the controlling party (Oracle for
> > > >example)
> > > >>> might
> > > >>> > be
> > > >>> >    interested in maintaining those operators as well.
> > > >>> >    - *providers* - API/software/deployments are fully controlled
> > > >by a
> > > >>> 3rd
> > > >>> >    party. Here most likely "provider" will be interested in
> > > >maintaining
> > > >>> the
> > > >>> >    operators (and for example like Google - provide integration
> > > >>> guidelines
> > > >>> >    <
> > > >>> >
> > > >>>
> > > >
> > >
> > https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > > >>> > >
> > > >>> > for
> > > >>> >    their hooks/operators/sensors)
> > > >>> >
> > > >>> >
> > > >>> > 3) Between-providers transfer operators should be kept at the
> > > >"target"
> > > >>> > rather than "source"
> > > >>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3
> > > >>> should
> > > >>> > be in "amazon".
> > > >>> >
> > > >>> > 4) One-side provider transfer operators should be kept at the
> > > >"provider"
> > > >>> > regardless if they are target or source.
> > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> > > >provider.
> > > >>> >
> > > >>> > 5) If in doubt we will discuss individual cases separately.
> > > >>> >
> > > >>> > J.
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>> > Jarek Potiuk
> > > >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >>> >
> > > >>> > M: +48 660 796 129 <+48660796129>
> > > >>> > [image: Polidea] <https://www.polidea.com/>
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Jarek Potiuk
> > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >>
> > > >> M: +48 660 796 129 <+48660796129>
> > > >> [image: Polidea] <https://www.polidea.com/>
> > > >>
> > > >>
> > > >
> > > >--
> > > >
> > > >Jarek Potiuk
> > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > >M: +48 660 796 129 <+48660796129>
> > > >[image: Polidea] <https://www.polidea.com/>
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Kaxil Naik <ka...@gmail.com>.
I think we should keep the vote open at least until mid next week to have
more thought and inputs on this one.

In general, I am happy with the approach but operators/hooks and sensors
shouldn't be a provider. "hadoop" can be its provider and hdfs can be a
part of it.

providers/
    google
         cloud
             operators
             hooks
             sensors
         gsuite
             operators
             ...
    amazon
         aws
             operators
             ...
    microsoft
         azure
             operators
             ...
    hadoop
        hdfs
             operators
             ...

We can also define what is a "provider" so we know what to add in it in the
future. SSH/FTP/SFTP belongs to the same family group. Do we want to have
separate providers for each one of them ???

Regards,
Kaxil

On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I really like to make everything a provider. That's a great idea ! This way
> everything "backportable" will have to be in "providers" package. Really
> nice and clean separation (and less mess in "airflow"). And we will not
> have to have any artificial grouping (we can still group them at the
> documentation level).
>
> We do not need backport in name. And I think it's more of technical detail
> on naming the package which we can work out while reviewing PRs and we can
> agree final naming of the released packaged on PMC level (PMCs will have to
> vote on releasing those).
>
> The thinking is that it's intention is really to be only backported to 1.10
> - we are not going (yet) to use the packages in Airflow 2.*. so I thought
> by naming them backport we can express that intent more clearly.
>
> So let me clarify the structure of folders we are going to have if we
> follow it (i just added some examples) including the already agreed changes
> from AIP-21:
>
> providers/
>     google
>          cloud
>              operators
>              hooks
>              sensors
>          gsuite
>              operators
>              ...
>     amazon
>          aws
>              operators
>              ...
>     microsoft
>          azure
>              operators
>              ...
>     operators
>          sqlite.py
>          oracle.py
>          docker.py
>     hooks
>          hdfs.py
>          sqlite.py
>     sensors
>          http.py
>          sql.py
>
>
> J.
>
> On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> > Do we need to include `-backport,`? What was the thinking behind that?
> >
> > I think software and protocol should be merged. I would also say
> > _everything_ is a provider, so airflow.providers.ssh.SSHOperator for
> > instance is what I would prefer
> >
> > -a
> >
> > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> > >One more day to go. I would love to see some opinions on this AIP-21
> > >update
> > >:).
> > >
> > >Executive summary:
> > >
> > >* we will be moving a number of integrations to sub-packages of
> > >airflow.
> > >* they will be backportable to 1.10.*.  There will be
> > >'apache-airflow-[package]-backport' pypi installable with python 3 that
> > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> > >operators.
> > >* the current proposal for sub-packages is
> > >"protocols/software/providers/"
> > >(but if you think merging protocols and software makes sense - please
> > >express your opinion
> > >* we are not moving "fundamental" operators/hooks etc..
> > >* Airflow 2.0 is still going to be installed as a single package with
> > >all
> > >operators (so we are not yet implementing AIP-8)
> > >
> > >J.
> > >
> > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
> > >wrote:
> > >
> > >> I think all this cases are valid but maybe I was not super-clear.
> > >It's
> > >> only the transfer operators that we need to decide where to put - not
> > >> hooks.
> > >> Usually the complexity of communication with particular storages is
> > >(or at
> > >> least should be) in the Hooks rather than Operators.
> > >>
> > >> Operators should be just thin wrappers over the logic in the hooks.
> > >> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS
> > >Hooks
> > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> > >>
> > >> Since we actually have mono-repo - this will be no problem (and no
> > >cross
> > >> dependencies problem) to have S3 -> GCS operator  in google and use
> > >hooks
> > >> from both google/amazon.
> > >>
> > >> I hope this alleviates your concern Daniel ?
> > >>
> > >> J.
> > >>
> > >>
> > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would
> > >put in
> > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in
> > >google
> > >>> sheets operators file?  The complexity, and the shared code, are in
> > >the
> > >>> gsheet component -- not into the storage destination.
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> > ><Ja...@polidea.com>
> > >>> wrote:
> > >>>
> > >>> > Hello Airflow Community,
> > >>> >
> > >>> > The email calls for a vote to update AIP-21 Changes in import
> > >paths
> > >>> > <
> > >>> >
> > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > >>> > >
> > >>> > with
> > >>> > the changes described below. The vote will last till Saturday 8th
> > >2am
> > >>> CEST
> > >>> > (72 hours). Committers have a binding vote but everyone from the
> > >>> community
> > >>> > is encouraged to cast an advisory vote.
> > >>> >
> > >>> > *Summary*:
> > >>> >
> > >>> > The proposal is to update AIP-21 to move all non-core
> > >>> > operators/hooks/sensor (and related files) to sub-packages within
> > >>> airflow
> > >>> > (protocols/software/providers) or (software/providers).
> > >>> > I am also happy to merge protocols+software, so if you have a
> > >strong
> > >>> > opinion on it - please state it with your vote and we can decide
> > >based
> > >>> on
> > >>> > majority.
> > >>> >
> > >>> > Those packages will be separately released (schedule/process TBD)
> > >and
> > >>> will
> > >>> > be backportable to 1.10.* airflow series, so that users can
> > >install it
> > >>> and
> > >>> > start using new Airflow2.0 operators in their Python 3 Airflow
> > >1.10
> > >>> > environments (only Python 3.5+ is supported).
> > >>> >
> > >>> > We will proceed with migrating the providers package to already
> > >agreed
> > >>> > paths without waiting for the final vote (following current
> > >version of
> > >>> > AIP-21). Since we have working POC - we know the agreed paths will
> > >work
> > >>> for
> > >>> > us.
> > >>> >
> > >>> > *Previous discussions: *
> > >>> >
> > >>> >    -
> > >>> >
> > >>> >
> > >>>
> > >
> >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> > >>> >    -
> > >>> >
> > >>> >
> > >>>
> > >
> >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> > >>> >
> > >>> > *More Details*:
> > >>> >
> > >>> > 1) Information that we are going in the direction of AIP-8 but not
> > >yet
> > >>> > reaching it - focusing on separating out backportable packages
> > >>> installable
> > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed
> > >as a
> > >>> whole
> > >>> > and all the source will be kept in one repo, but we now have a way
> > >to
> > >>> build
> > >>> > backportable packages for groups of operators. POC available here:
> > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> > >>> > https://github.com/ashb/airflow-submodule-test)
> > >>> >
> > >>> > 2) We move all integrations to new packages (keeping deprecated
> > >import
> > >>> > aliases in the old places). The following split (according to
> > >>> "stewardship"
> > >>> > over the integrations):
> > >>> >
> > >>> >    - *fundamentals* - core of ariflow - they are really part of
> > >Apache
> > >>> >    Airflow. Stewards - core Airflow team. Not
> > >backportable/separated
> > >>> out.
> > >>> >    - *protocols* - are not owned by anyone, they are public and
> > >the
> > >>> >    implementation is fully "open". There are no particular
> > >stewards (no
> > >>> > need).
> > >>> >    Users of particular protocols should mainly maintain those and
> > >add
> > >>> > support
> > >>> >    for different versions of the protocols.
> > >>> >    - *software* - both API and software are controlled by someone
> > >>> outside
> > >>> >    of Airflow (commercial or open-source project), but the
> > >deployment of
> > >>> > that
> > >>> >    software is "owned" by the user installing Airflow. The
> > >"stewardship"
> > >>> > might
> > >>> >    be also the users but the controlling party (Oracle for
> > >example)
> > >>> might
> > >>> > be
> > >>> >    interested in maintaining those operators as well.
> > >>> >    - *providers* - API/software/deployments are fully controlled
> > >by a
> > >>> 3rd
> > >>> >    party. Here most likely "provider" will be interested in
> > >maintaining
> > >>> the
> > >>> >    operators (and for example like Google - provide integration
> > >>> guidelines
> > >>> >    <
> > >>> >
> > >>>
> > >
> >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > >>> > >
> > >>> > for
> > >>> >    their hooks/operators/sensors)
> > >>> >
> > >>> >
> > >>> > 3) Between-providers transfer operators should be kept at the
> > >"target"
> > >>> > rather than "source"
> > >>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3
> > >>> should
> > >>> > be in "amazon".
> > >>> >
> > >>> > 4) One-side provider transfer operators should be kept at the
> > >"provider"
> > >>> > regardless if they are target or source.
> > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> > >provider.
> > >>> >
> > >>> > 5) If in doubt we will discuss individual cases separately.
> > >>> >
> > >>> > J.
> > >>> >
> > >>> > --
> > >>> >
> > >>> > Jarek Potiuk
> > >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>> >
> > >>> > M: +48 660 796 129 <+48660796129>
> > >>> > [image: Polidea] <https://www.polidea.com/>
> > >>> >
> > >>>
> > >>
> > >>
> > >> --
> > >>
> > >> Jarek Potiuk
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >>
> > >
> > >--
> > >
> > >Jarek Potiuk
> > >Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > >M: +48 660 796 129 <+48660796129>
> > >[image: Polidea] <https://www.polidea.com/>
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
I really like to make everything a provider. That's a great idea ! This way
everything "backportable" will have to be in "providers" package. Really
nice and clean separation (and less mess in "airflow"). And we will not
have to have any artificial grouping (we can still group them at the
documentation level).

We do not need backport in name. And I think it's more of technical detail
on naming the package which we can work out while reviewing PRs and we can
agree final naming of the released packaged on PMC level (PMCs will have to
vote on releasing those).

The thinking is that it's intention is really to be only backported to 1.10
- we are not going (yet) to use the packages in Airflow 2.*. so I thought
by naming them backport we can express that intent more clearly.

So let me clarify the structure of folders we are going to have if we
follow it (i just added some examples) including the already agreed changes
from AIP-21:

providers/
    google
         cloud
             operators
             hooks
             sensors
         gsuite
             operators
             ...
    amazon
         aws
             operators
             ...
    microsoft
         azure
             operators
             ...
    operators
         sqlite.py
         oracle.py
         docker.py
    hooks
         hdfs.py
         sqlite.py
    sensors
         http.py
         sql.py


J.

On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> Do we need to include `-backport,`? What was the thinking behind that?
>
> I think software and protocol should be merged. I would also say
> _everything_ is a provider, so airflow.providers.ssh.SSHOperator for
> instance is what I would prefer
>
> -a
>
> On 8 November 2019 08:32:42 GMT, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> >One more day to go. I would love to see some opinions on this AIP-21
> >update
> >:).
> >
> >Executive summary:
> >
> >* we will be moving a number of integrations to sub-packages of
> >airflow.
> >* they will be backportable to 1.10.*.  There will be
> >'apache-airflow-[package]-backport' pypi installable with python 3 that
> >will make Airflow 2.0 operators/hooks etc. available with 1.10*
> >operators.
> >* the current proposal for sub-packages is
> >"protocols/software/providers/"
> >(but if you think merging protocols and software makes sense - please
> >express your opinion
> >* we are not moving "fundamental" operators/hooks etc..
> >* Airflow 2.0 is still going to be installed as a single package with
> >all
> >operators (so we are not yet implementing AIP-8)
> >
> >J.
> >
> >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
> >wrote:
> >
> >> I think all this cases are valid but maybe I was not super-clear.
> >It's
> >> only the transfer operators that we need to decide where to put - not
> >> hooks.
> >> Usually the complexity of communication with particular storages is
> >(or at
> >> least should be) in the Hooks rather than Operators.
> >>
> >> Operators should be just thin wrappers over the logic in the hooks.
> >> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS
> >Hooks
> >> in google.cloud, GoogleSheet Hooks in google.gsuite.
> >>
> >> Since we actually have mono-repo - this will be no problem (and no
> >cross
> >> dependencies problem) to have S3 -> GCS operator  in google and use
> >hooks
> >> from both google/amazon.
> >>
> >> I hope this alleviates your concern Daniel ?
> >>
> >> J.
> >>
> >>
> >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would
> >put in
> >>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in
> >google
> >>> sheets operators file?  The complexity, and the shared code, are in
> >the
> >>> gsheet component -- not into the storage destination.
> >>>
> >>>
> >>
> >>
> >>
> >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
> ><Ja...@polidea.com>
> >>> wrote:
> >>>
> >>> > Hello Airflow Community,
> >>> >
> >>> > The email calls for a vote to update AIP-21 Changes in import
> >paths
> >>> > <
> >>> >
> >>>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >>> > >
> >>> > with
> >>> > the changes described below. The vote will last till Saturday 8th
> >2am
> >>> CEST
> >>> > (72 hours). Committers have a binding vote but everyone from the
> >>> community
> >>> > is encouraged to cast an advisory vote.
> >>> >
> >>> > *Summary*:
> >>> >
> >>> > The proposal is to update AIP-21 to move all non-core
> >>> > operators/hooks/sensor (and related files) to sub-packages within
> >>> airflow
> >>> > (protocols/software/providers) or (software/providers).
> >>> > I am also happy to merge protocols+software, so if you have a
> >strong
> >>> > opinion on it - please state it with your vote and we can decide
> >based
> >>> on
> >>> > majority.
> >>> >
> >>> > Those packages will be separately released (schedule/process TBD)
> >and
> >>> will
> >>> > be backportable to 1.10.* airflow series, so that users can
> >install it
> >>> and
> >>> > start using new Airflow2.0 operators in their Python 3 Airflow
> >1.10
> >>> > environments (only Python 3.5+ is supported).
> >>> >
> >>> > We will proceed with migrating the providers package to already
> >agreed
> >>> > paths without waiting for the final vote (following current
> >version of
> >>> > AIP-21). Since we have working POC - we know the agreed paths will
> >work
> >>> for
> >>> > us.
> >>> >
> >>> > *Previous discussions: *
> >>> >
> >>> >    -
> >>> >
> >>> >
> >>>
> >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> >>> >    -
> >>> >
> >>> >
> >>>
> >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> >>> >
> >>> > *More Details*:
> >>> >
> >>> > 1) Information that we are going in the direction of AIP-8 but not
> >yet
> >>> > reaching it - focusing on separating out backportable packages
> >>> installable
> >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed
> >as a
> >>> whole
> >>> > and all the source will be kept in one repo, but we now have a way
> >to
> >>> build
> >>> > backportable packages for groups of operators. POC available here:
> >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> >>> > https://github.com/ashb/airflow-submodule-test)
> >>> >
> >>> > 2) We move all integrations to new packages (keeping deprecated
> >import
> >>> > aliases in the old places). The following split (according to
> >>> "stewardship"
> >>> > over the integrations):
> >>> >
> >>> >    - *fundamentals* - core of ariflow - they are really part of
> >Apache
> >>> >    Airflow. Stewards - core Airflow team. Not
> >backportable/separated
> >>> out.
> >>> >    - *protocols* - are not owned by anyone, they are public and
> >the
> >>> >    implementation is fully "open". There are no particular
> >stewards (no
> >>> > need).
> >>> >    Users of particular protocols should mainly maintain those and
> >add
> >>> > support
> >>> >    for different versions of the protocols.
> >>> >    - *software* - both API and software are controlled by someone
> >>> outside
> >>> >    of Airflow (commercial or open-source project), but the
> >deployment of
> >>> > that
> >>> >    software is "owned" by the user installing Airflow. The
> >"stewardship"
> >>> > might
> >>> >    be also the users but the controlling party (Oracle for
> >example)
> >>> might
> >>> > be
> >>> >    interested in maintaining those operators as well.
> >>> >    - *providers* - API/software/deployments are fully controlled
> >by a
> >>> 3rd
> >>> >    party. Here most likely "provider" will be interested in
> >maintaining
> >>> the
> >>> >    operators (and for example like Google - provide integration
> >>> guidelines
> >>> >    <
> >>> >
> >>>
> >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> >>> > >
> >>> > for
> >>> >    their hooks/operators/sensors)
> >>> >
> >>> >
> >>> > 3) Between-providers transfer operators should be kept at the
> >"target"
> >>> > rather than "source"
> >>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3
> >>> should
> >>> > be in "amazon".
> >>> >
> >>> > 4) One-side provider transfer operators should be kept at the
> >"provider"
> >>> > regardless if they are target or source.
> >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
> >provider.
> >>> >
> >>> > 5) If in doubt we will discuss individual cases separately.
> >>> >
> >>> > J.
> >>> >
> >>> > --
> >>> >
> >>> > Jarek Potiuk
> >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>> >
> >>> > M: +48 660 796 129 <+48660796129>
> >>> > [image: Polidea] <https://www.polidea.com/>
> >>> >
> >>>
> >>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >
> >--
> >
> >Jarek Potiuk
> >Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> >M: +48 660 796 129 <+48660796129>
> >[image: Polidea] <https://www.polidea.com/>
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Ash Berlin-Taylor <as...@apache.org>.
Do we need to include `-backport,`? What was the thinking behind that?

I think software and protocol should be merged. I would also say _everything_ is a provider, so airflow.providers.ssh.SSHOperator for instance is what I would prefer

-a

On 8 November 2019 08:32:42 GMT, Jarek Potiuk <Ja...@polidea.com> wrote:
>One more day to go. I would love to see some opinions on this AIP-21
>update
>:).
>
>Executive summary:
>
>* we will be moving a number of integrations to sub-packages of
>airflow.
>* they will be backportable to 1.10.*.  There will be
>'apache-airflow-[package]-backport' pypi installable with python 3 that
>will make Airflow 2.0 operators/hooks etc. available with 1.10*
>operators.
>* the current proposal for sub-packages is
>"protocols/software/providers/"
>(but if you think merging protocols and software makes sense - please
>express your opinion
>* we are not moving "fundamental" operators/hooks etc..
>* Airflow 2.0 is still going to be installed as a single package with
>all
>operators (so we are not yet implementing AIP-8)
>
>J.
>
>On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
>wrote:
>
>> I think all this cases are valid but maybe I was not super-clear.
>It's
>> only the transfer operators that we need to decide where to put - not
>> hooks.
>> Usually the complexity of communication with particular storages is
>(or at
>> least should be) in the Hooks rather than Operators.
>>
>> Operators should be just thin wrappers over the logic in the hooks.
>> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS
>Hooks
>> in google.cloud, GoogleSheet Hooks in google.gsuite.
>>
>> Since we actually have mono-repo - this will be no problem (and no
>cross
>> dependencies problem) to have S3 -> GCS operator  in google and use
>hooks
>> from both google/amazon.
>>
>> I hope this alleviates your concern Daniel ?
>>
>> J.
>>
>>
>>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would
>put in
>>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in
>google
>>> sheets operators file?  The complexity, and the shared code, are in
>the
>>> gsheet component -- not into the storage destination.
>>>
>>>
>>
>>
>>
>>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
><Ja...@polidea.com>
>>> wrote:
>>>
>>> > Hello Airflow Community,
>>> >
>>> > The email calls for a vote to update AIP-21 Changes in import
>paths
>>> > <
>>> >
>>>
>https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>>> > >
>>> > with
>>> > the changes described below. The vote will last till Saturday 8th
>2am
>>> CEST
>>> > (72 hours). Committers have a binding vote but everyone from the
>>> community
>>> > is encouraged to cast an advisory vote.
>>> >
>>> > *Summary*:
>>> >
>>> > The proposal is to update AIP-21 to move all non-core
>>> > operators/hooks/sensor (and related files) to sub-packages within
>>> airflow
>>> > (protocols/software/providers) or (software/providers).
>>> > I am also happy to merge protocols+software, so if you have a
>strong
>>> > opinion on it - please state it with your vote and we can decide
>based
>>> on
>>> > majority.
>>> >
>>> > Those packages will be separately released (schedule/process TBD)
>and
>>> will
>>> > be backportable to 1.10.* airflow series, so that users can
>install it
>>> and
>>> > start using new Airflow2.0 operators in their Python 3 Airflow
>1.10
>>> > environments (only Python 3.5+ is supported).
>>> >
>>> > We will proceed with migrating the providers package to already
>agreed
>>> > paths without waiting for the final vote (following current
>version of
>>> > AIP-21). Since we have working POC - we know the agreed paths will
>work
>>> for
>>> > us.
>>> >
>>> > *Previous discussions: *
>>> >
>>> >    -
>>> >
>>> >
>>>
>https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>>> >    -
>>> >
>>> >
>>>
>https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>>> >
>>> > *More Details*:
>>> >
>>> > 1) Information that we are going in the direction of AIP-8 but not
>yet
>>> > reaching it - focusing on separating out backportable packages
>>> installable
>>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed
>as a
>>> whole
>>> > and all the source will be kept in one repo, but we now have a way
>to
>>> build
>>> > backportable packages for groups of operators. POC available here:
>>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
>>> > https://github.com/ashb/airflow-submodule-test)
>>> >
>>> > 2) We move all integrations to new packages (keeping deprecated
>import
>>> > aliases in the old places). The following split (according to
>>> "stewardship"
>>> > over the integrations):
>>> >
>>> >    - *fundamentals* - core of ariflow - they are really part of
>Apache
>>> >    Airflow. Stewards - core Airflow team. Not
>backportable/separated
>>> out.
>>> >    - *protocols* - are not owned by anyone, they are public and
>the
>>> >    implementation is fully "open". There are no particular
>stewards (no
>>> > need).
>>> >    Users of particular protocols should mainly maintain those and
>add
>>> > support
>>> >    for different versions of the protocols.
>>> >    - *software* - both API and software are controlled by someone
>>> outside
>>> >    of Airflow (commercial or open-source project), but the
>deployment of
>>> > that
>>> >    software is "owned" by the user installing Airflow. The
>"stewardship"
>>> > might
>>> >    be also the users but the controlling party (Oracle for
>example)
>>> might
>>> > be
>>> >    interested in maintaining those operators as well.
>>> >    - *providers* - API/software/deployments are fully controlled
>by a
>>> 3rd
>>> >    party. Here most likely "provider" will be interested in
>maintaining
>>> the
>>> >    operators (and for example like Google - provide integration
>>> guidelines
>>> >    <
>>> >
>>>
>https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
>>> > >
>>> > for
>>> >    their hooks/operators/sensors)
>>> >
>>> >
>>> > 3) Between-providers transfer operators should be kept at the
>"target"
>>> > rather than "source"
>>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3
>>> should
>>> > be in "amazon".
>>> >
>>> > 4) One-side provider transfer operators should be kept at the
>"provider"
>>> > regardless if they are target or source.
>>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
>provider.
>>> >
>>> > 5) If in doubt we will discuss individual cases separately.
>>> >
>>> > J.
>>> >
>>> > --
>>> >
>>> > Jarek Potiuk
>>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> >
>>> > M: +48 660 796 129 <+48660796129>
>>> > [image: Polidea] <https://www.polidea.com/>
>>> >
>>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
>-- 
>
>Jarek Potiuk
>Polidea <https://www.polidea.com/> | Principal Software Engineer
>
>M: +48 660 796 129 <+48660796129>
>[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
One more day to go. I would love to see some opinions on this AIP-21 update
:).

Executive summary:

* we will be moving a number of integrations to sub-packages of airflow.
* they will be backportable to 1.10.*.  There will be
'apache-airflow-[package]-backport' pypi installable with python 3 that
will make Airflow 2.0 operators/hooks etc. available with 1.10* operators.
* the current proposal for sub-packages is "protocols/software/providers/"
(but if you think merging protocols and software makes sense - please
express your opinion
* we are not moving "fundamental" operators/hooks etc..
* Airflow 2.0 is still going to be installed as a single package with all
operators (so we are not yet implementing AIP-8)

J.

On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I think all this cases are valid but maybe I was not super-clear. It's
> only the transfer operators that we need to decide where to put - not
> hooks.
> Usually the complexity of communication with particular storages is (or at
> least should be) in the Hooks rather than Operators.
>
> Operators should be just thin wrappers over the logic in the hooks.
> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS Hooks
> in google.cloud, GoogleSheet Hooks in google.gsuite.
>
> Since we actually have mono-repo - this will be no problem (and no cross
> dependencies problem) to have S3 -> GCS operator  in google and use hooks
> from both google/amazon.
>
> I hope this alleviates your concern Daniel ?
>
> J.
>
>
>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would put in
>> the target, i.e. the storage?  But GoogleSheetsToSftp would be in google
>> sheets operators file?  The complexity, and the shared code, are in the
>> gsheet component -- not into the storage destination.
>>
>>
>
>
>
>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> > Hello Airflow Community,
>> >
>> > The email calls for a vote to update AIP-21 Changes in import paths
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>> > >
>> > with
>> > the changes described below. The vote will last till Saturday 8th 2am
>> CEST
>> > (72 hours). Committers have a binding vote but everyone from the
>> community
>> > is encouraged to cast an advisory vote.
>> >
>> > *Summary*:
>> >
>> > The proposal is to update AIP-21 to move all non-core
>> > operators/hooks/sensor (and related files) to sub-packages within
>> airflow
>> > (protocols/software/providers) or (software/providers).
>> > I am also happy to merge protocols+software, so if you have a strong
>> > opinion on it - please state it with your vote and we can decide based
>> on
>> > majority.
>> >
>> > Those packages will be separately released (schedule/process TBD) and
>> will
>> > be backportable to 1.10.* airflow series, so that users can install it
>> and
>> > start using new Airflow2.0 operators in their Python 3 Airflow 1.10
>> > environments (only Python 3.5+ is supported).
>> >
>> > We will proceed with migrating the providers package to already agreed
>> > paths without waiting for the final vote (following current version of
>> > AIP-21). Since we have working POC - we know the agreed paths will work
>> for
>> > us.
>> >
>> > *Previous discussions: *
>> >
>> >    -
>> >
>> >
>> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>> >    -
>> >
>> >
>> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>> >
>> > *More Details*:
>> >
>> > 1) Information that we are going in the direction of AIP-8 but not yet
>> > reaching it - focusing on separating out backportable packages
>> installable
>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed as a
>> whole
>> > and all the source will be kept in one repo, but we now have a way to
>> build
>> > backportable packages for groups of operators. POC available here:
>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
>> > https://github.com/ashb/airflow-submodule-test)
>> >
>> > 2) We move all integrations to new packages (keeping deprecated import
>> > aliases in the old places). The following split (according to
>> "stewardship"
>> > over the integrations):
>> >
>> >    - *fundamentals* - core of ariflow - they are really part of Apache
>> >    Airflow. Stewards - core Airflow team. Not backportable/separated
>> out.
>> >    - *protocols* - are not owned by anyone, they are public and the
>> >    implementation is fully "open". There are no particular stewards (no
>> > need).
>> >    Users of particular protocols should mainly maintain those and add
>> > support
>> >    for different versions of the protocols.
>> >    - *software* - both API and software are controlled by someone
>> outside
>> >    of Airflow (commercial or open-source project), but the deployment of
>> > that
>> >    software is "owned" by the user installing Airflow. The "stewardship"
>> > might
>> >    be also the users but the controlling party (Oracle for example)
>> might
>> > be
>> >    interested in maintaining those operators as well.
>> >    - *providers* - API/software/deployments are fully controlled by a
>> 3rd
>> >    party. Here most likely "provider" will be interested in maintaining
>> the
>> >    operators (and for example like Google - provide integration
>> guidelines
>> >    <
>> >
>> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
>> > >
>> > for
>> >    their hooks/operators/sensors)
>> >
>> >
>> > 3) Between-providers transfer operators should be kept at the "target"
>> > rather than "source"
>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3
>> should
>> > be in "amazon".
>> >
>> > 4) One-side provider transfer operators should be kept at the "provider"
>> > regardless if they are target or source.
>> > For example GCS-> SFTP or SFTP -> GCS should be in "google" provider.
>> >
>> > 5) If in doubt we will discuss individual cases separately.
>> >
>> > J.
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Jarek Potiuk <Ja...@polidea.com>.
I think all this cases are valid but maybe I was not super-clear. It's only
the transfer operators that we need to decide where to put - not hooks.
Usually the complexity of communication with particular storages is (or at
least should be) in the Hooks rather than Operators.

Operators should be just thin wrappers over the logic in the hooks.
Hooks are going to stay where they belong - S3 Hooks in amazon, GCS Hooks
in google.cloud, GoogleSheet Hooks in google.gsuite.

Since we actually have mono-repo - this will be no problem (and no cross
dependencies problem) to have S3 -> GCS operator  in google and use hooks
from both google/amazon.

I hope this alleviates your concern Daniel ?

J.


> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would put in
> the target, i.e. the storage?  But GoogleSheetsToSftp would be in google
> sheets operators file?  The complexity, and the shared code, are in the
> gsheet component -- not into the storage destination.
>
>



> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > Hello Airflow Community,
> >
> > The email calls for a vote to update AIP-21 Changes in import paths
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > >
> > with
> > the changes described below. The vote will last till Saturday 8th 2am
> CEST
> > (72 hours). Committers have a binding vote but everyone from the
> community
> > is encouraged to cast an advisory vote.
> >
> > *Summary*:
> >
> > The proposal is to update AIP-21 to move all non-core
> > operators/hooks/sensor (and related files) to sub-packages within airflow
> > (protocols/software/providers) or (software/providers).
> > I am also happy to merge protocols+software, so if you have a strong
> > opinion on it - please state it with your vote and we can decide based on
> > majority.
> >
> > Those packages will be separately released (schedule/process TBD) and
> will
> > be backportable to 1.10.* airflow series, so that users can install it
> and
> > start using new Airflow2.0 operators in their Python 3 Airflow 1.10
> > environments (only Python 3.5+ is supported).
> >
> > We will proceed with migrating the providers package to already agreed
> > paths without waiting for the final vote (following current version of
> > AIP-21). Since we have working POC - we know the agreed paths will work
> for
> > us.
> >
> > *Previous discussions: *
> >
> >    -
> >
> >
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
> >    -
> >
> >
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
> >
> > *More Details*:
> >
> > 1) Information that we are going in the direction of AIP-8 but not yet
> > reaching it - focusing on separating out backportable packages
> installable
> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed as a
> whole
> > and all the source will be kept in one repo, but we now have a way to
> build
> > backportable packages for groups of operators. POC available here:
> > https://github.com/apache/airflow/pull/6507 (based on Ash's
> > https://github.com/ashb/airflow-submodule-test)
> >
> > 2) We move all integrations to new packages (keeping deprecated import
> > aliases in the old places). The following split (according to
> "stewardship"
> > over the integrations):
> >
> >    - *fundamentals* - core of ariflow - they are really part of Apache
> >    Airflow. Stewards - core Airflow team. Not backportable/separated out.
> >    - *protocols* - are not owned by anyone, they are public and the
> >    implementation is fully "open". There are no particular stewards (no
> > need).
> >    Users of particular protocols should mainly maintain those and add
> > support
> >    for different versions of the protocols.
> >    - *software* - both API and software are controlled by someone outside
> >    of Airflow (commercial or open-source project), but the deployment of
> > that
> >    software is "owned" by the user installing Airflow. The "stewardship"
> > might
> >    be also the users but the controlling party (Oracle for example) might
> > be
> >    interested in maintaining those operators as well.
> >    - *providers* - API/software/deployments are fully controlled by a 3rd
> >    party. Here most likely "provider" will be interested in maintaining
> the
> >    operators (and for example like Google - provide integration
> guidelines
> >    <
> >
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> > >
> > for
> >    their hooks/operators/sensors)
> >
> >
> > 3) Between-providers transfer operators should be kept at the "target"
> > rather than "source"
> > For example S3 -> GCS should be in "google" provider, but GCS-> S3 should
> > be in "amazon".
> >
> > 4) One-side provider transfer operators should be kept at the "provider"
> > regardless if they are target or source.
> > For example GCS-> SFTP or SFTP -> GCS should be in "google" provider.
> >
> > 5) If in doubt we will discuss individual cases separately.
> >
> > J.
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [VOTE] AIP-21 update for Airflow 1.10.* backportability

Posted by Daniel Standish <dp...@gmail.com>.
Re

For example S3 -> GCS should be in "google" provider, but GCS-> S3 should
> be in "amazon
>

So if there were a BigQueryToS3 or SnowflakeToS3 operator, would you put
this in AWS?

I feel like storage should be a secondary consideration concerning object
naming.

Using snowflake as an example, we might have export operator variations
like SnowflakeToS3, SnowflakeToGCS, SnowflakeToAzureBlobStorage.  In my
view these would make sense in the same file as a BaseSnowflakeOperator, in
a snowflake operators module -- not in the target.  The storage component
for this kind of operator is secondary.

What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would put in
the target, i.e. the storage?  But GoogleSheetsToSftp would be in google
sheets operators file?  The complexity, and the shared code, are in the
gsheet component -- not into the storage destination.







On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Hello Airflow Community,
>
> The email calls for a vote to update AIP-21 Changes in import paths
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >
> with
> the changes described below. The vote will last till Saturday 8th 2am CEST
> (72 hours). Committers have a binding vote but everyone from the community
> is encouraged to cast an advisory vote.
>
> *Summary*:
>
> The proposal is to update AIP-21 to move all non-core
> operators/hooks/sensor (and related files) to sub-packages within airflow
> (protocols/software/providers) or (software/providers).
> I am also happy to merge protocols+software, so if you have a strong
> opinion on it - please state it with your vote and we can decide based on
> majority.
>
> Those packages will be separately released (schedule/process TBD) and will
> be backportable to 1.10.* airflow series, so that users can install it and
> start using new Airflow2.0 operators in their Python 3 Airflow 1.10
> environments (only Python 3.5+ is supported).
>
> We will proceed with migrating the providers package to already agreed
> paths without waiting for the final vote (following current version of
> AIP-21). Since we have working POC - we know the agreed paths will work for
> us.
>
> *Previous discussions: *
>
>    -
>
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>    -
>
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>
> *More Details*:
>
> 1) Information that we are going in the direction of AIP-8 but not yet
> reaching it - focusing on separating out backportable packages installable
> in Airflow releases 1.10.* . Airflow 2.0 will still be installed as a whole
> and all the source will be kept in one repo, but we now have a way to build
> backportable packages for groups of operators. POC available here:
> https://github.com/apache/airflow/pull/6507 (based on Ash's
> https://github.com/ashb/airflow-submodule-test)
>
> 2) We move all integrations to new packages (keeping deprecated import
> aliases in the old places). The following split (according to "stewardship"
> over the integrations):
>
>    - *fundamentals* - core of ariflow - they are really part of Apache
>    Airflow. Stewards - core Airflow team. Not backportable/separated out.
>    - *protocols* - are not owned by anyone, they are public and the
>    implementation is fully "open". There are no particular stewards (no
> need).
>    Users of particular protocols should mainly maintain those and add
> support
>    for different versions of the protocols.
>    - *software* - both API and software are controlled by someone outside
>    of Airflow (commercial or open-source project), but the deployment of
> that
>    software is "owned" by the user installing Airflow. The "stewardship"
> might
>    be also the users but the controlling party (Oracle for example) might
> be
>    interested in maintaining those operators as well.
>    - *providers* - API/software/deployments are fully controlled by a 3rd
>    party. Here most likely "provider" will be interested in maintaining the
>    operators (and for example like Google - provide integration guidelines
>    <
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> >
> for
>    their hooks/operators/sensors)
>
>
> 3) Between-providers transfer operators should be kept at the "target"
> rather than "source"
> For example S3 -> GCS should be in "google" provider, but GCS-> S3 should
> be in "amazon".
>
> 4) One-side provider transfer operators should be kept at the "provider"
> regardless if they are target or source.
> For example GCS-> SFTP or SFTP -> GCS should be in "google" provider.
>
> 5) If in doubt we will discuss individual cases separately.
>
> J.
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>