You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by "Beck, Vincent" <vi...@amazon.com.INVALID> on 2023/02/02 20:47:23 UTC

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Hi all,
First of all, thank you for all the comments in the proposal above. These feedbacks are very valuable and really helped me to move forward on that topic. Airflow multi-tenant model is definitely a vast project and I believe it is best to do in several AIPs to have clear scope and reduce the cognitive load on reviewers. For now, I can see at least 3 different AIPs:

1. View level access control: introduce "Tenant" entity and allow grouping of users
2. Introduction of DAG group (exact concept name TBD): allow grouping of DAGs and impose access restrictions
3. Resource level access control: Modify resources access policies based on Tenants
We may decide to merge AIP 2 and 3 or add new ones depending on learnings from 1st one.
To start off, I drafted an AIP on the view level access control, you can access it here: https://docs.google.com/document/d/1swNx_GTvUm456w8UKgQS1-CbGPu2OFr4l9lDaUnXbN8/edit?usp=sharing. If you agree with this proposal, would it be possible to have permissions to create an AIP?
Thank you,
Vincent

From: "Mehta, Shubham" <sh...@amazon.com>
Date: Tuesday, January 10, 2023 at 3:01 AM
To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Cc: "Beck, Vincent" <vi...@amazon.com>, "Mehta, Shubham" <sh...@amazon.com>
Subject: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Hi folks,

Over the past few weeks, @Vincent Beck<ma...@amazon.com> and I have been working on a proposal for a multi-tenant model for Apache Airflow<https://docs.google.com/document/d/1n23h26p4_8F5-Cd0JGLPEnF3gumJ5hw3EpwUljz7HcE/edit?usp=sharing>. Building on AIP-43<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation> (DAG Processor separation) and AIP-44<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API> (Airflow Internal API), we aim to modify the existing Role-Based Access Control (RBAC) to provide fine-grained access control and pave the way for running Airflow in a multi-tenant fashion.

Multi-tenancy support in Airflow would allow users to use a single Airflow environment to support multiple teams or business units, each with their own isolated workflows, user permissions, and data. This can offer a number of benefits including cost savings from a shared environment, improved collaboration among teams, and enhanced security through isolation, while also reducing the overall operational load.

In the proposal, we outline user requirements and describe the design for view-level and resource-level access control. We intentionally did not include technical implementation details, as these will be covered in AIPs after alignment. The proposal also includes open questions and recommendations. We would like to thank Jarek, Filip, and Kaxil for providing early feedback, helping to ensure the design has no obvious flaws.

Please review the proposal and provide your feedback by January 18th. We will then proceed to draft AIPs with implementation details based on the final proposal.

Regards
Shubham Mehta

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Beck, Vincent" <vi...@amazon.com.INVALID>.
Hey Vikram,

Don’t worry about the delay and thanks for sharing your thoughts!

My overall feeling here tends to agree with you (after a discussion with Jarek I confess __). I like this idea of separating the user management to external providers, it allows more features and more user management models to be implemented. Overall I feel this is the right way to go on the long run. I am still not clear on some implementation details but I guess it is too early for that (if we chose this direction).

The changes suggested in our proposal would be basically shifted to outside of Airflow so I am trying to convince myself we'll still be able to cherry-pick some of the suggestions we proposed __ (as opposed to entirely lost).

However, the changes/idea you are proposing is quite impactful in terms of architecture for Airflow and direction taken so I would really love to hear other feedbacks and opinion on that topic.

Vincent

On 2023-02-13, 2:04 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    Hey Vikram,

    I think it's brilliant and I wonder how it happened that had not
    occurred to us earlier. And I believe that is due to the natural
    tendency of "following as we always did" rather than thinking
    completely out-of-the-box. Thanks Vikram for bringing it up.

    The funny thing is that when I see this:

    > However, I don't agree that this level of user management belongs in "Core Airflow".

    I almost immediately think - NOOOOO, why, it's always been here, how
    can we remove it?

    But then if you look a bit closer:

    > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.

    Then it starts to make way more sense. Way more.

    And when you look further:

    >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?

    My heart jumps and I am immediately sold on the idea.

    When I was commenting on the doc  initially, something was not right.
    I had a feeling It is probably the 5th time I am looking and
    commenting on a similar document. And, well, I did, actually. Most of
    the things we discussed there are already implemented out there. We
    just need to make sure we expose enough of the API to use them. For
    example we have Keycloak that is an open source implementation of
    Identity and Access Management. With everything out there already
    integrated. and I've been part of the project that integrated just the
    authentication part. Now if we rethink the authorization and make it
    simpler and "externally driven", this will not only be faster IMHO,
    but also will allow enterprise users to integrate much better.

    I believe following the path that Vikram outlined will be a good
    direction for everyone in the community - including all the Manage
    Service providers, who will have a far easier job on integrating
    Airflow into their authentication models.

    J.



    On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
    <vi...@astronomer.io.invalid> wrote:
    >
    > Shubham and Vincent,
    >
    > Let me start by saying that I apologize for my delayed response to your original email.
    >
    > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
    >
    > However, I don't agree that this level of user management belongs in "Core Airflow".
    >
    > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
    >
    > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
    >
    > Best regards,
    > Vikram
    >
    >
    > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
    >>
    >> Thanks __
    >>
    >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>
    >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>
    >>
    >>
    >>     Added.
    >>
    >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
    >>     <vi...@amazon.com.invalid> wrote:
    >>     >
    >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
    >>     >
    >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>     >
    >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>     >
    >>     >
    >>     >
    >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
    >>     >
    >>


Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Beck, Vincent" <vi...@amazon.com.INVALID>.
Hi all,

I started a discussion on the user management extraction of core Airflow here: https://github.com/apache/airflow/discussions/29986. Feel free to jump in this conversation if you're interested in that topic.

On 2023-02-21, 5:08 PM, "Mehta, Shubham" <shubhx@amazon.com <ma...@amazon.com>> wrote:


@Jarek - thank you for your initial deep dive on Keycloak. It looks very promising and is likely the open-source provider we should adopt for Multi-tenancy support. We can decide this at a later stage once we finish step 1.


>1) For existing users/those who want to keep all "in-airflow-ui" they could use FAB Provider (which will be separated from the Core). Same as today, but without the advanced management features for groups and tenants. We might consider dropping that altogether eventually.
As the next step, we will do deep dive on separating the FAB provider and designing the Airflow Authorization API. We will share our findings with the community as a GitHub discussion and may even do a PoC if necessary.


@Community, if you are an expert in authorization or FAB and interested in collaborating on this effort, please contact me or @Beck, Vincent (here or on Airflow Slack). We will be happy to work together to make Multi-tenancy in Airflow a reality.


Shubham


On 2023-02-17, 5:45 AM, "Jarek Potiuk" <jarek@potiuk.com <ma...@potiuk.com>> wrote:


And to Kaxil's mail: yep. What you wrote is exactly what I understood
needs to be done.


On Fri, Feb 17, 2023 at 2:40 PM Jarek Potiuk <jarek@potiuk.com <ma...@potiuk.com>> wrote:
>
> > Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
>
> I do not think this will happen. I think part of the effort should not
> only implement the API but also to provide a fully fledged (though
> simple) implementation of such a provider which works with an
> open-source implementation of identity - KeyCloak is one that comes to
> my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak
> as reference provider we can release", but I think KeyCloak has all we
> need:
> * integration with mutliple authentication providers and protocols
> * User Management:
> https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html>
> * Role Mangement including user mapping:
> https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html>
> * Group management:
> https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html>
>
> It comes with a management console, CLI and much more
> (auditing/session management etc. etc.)
>
> In a way it would be simply providing very much the same what FAB
> Security Manager does, but with much more complete scope and - most
> importantly - it would not be "part of Airflow as FAB is", it would be
> "outside" of it and the only thing Airflow would provide is merely
> pointers to the Docs of Keycloak on how to integrate it with Airflow
> as a proxy: https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html <https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html>
> (or it could be done by writing Airflow KeyCloak Adapter - to be
> decided what would be easier to maintain).The users will be free to
> configure KeyCloak proxy as they see fit. No DB needed in Airflow to
> manage any of those, no UI, no API, no CLI - all that delegated out
> and integrated via incoming headers or adapter.
>
> The users will have several choices:
>
> 1) For existing users/those who want to keep all "in-airflow-ui" they
> could use FAB Provider (which will be separated from the Core). Same
> as today, but without the advanced management features for groups and
> tenants. We might consider dropping that altogether eventually.
> 2) If they are on premise - they can use KeyCloak Provider - by
> following our advice/suggestions/simple guidelines on how to
> integrate. They would have to manage their own KeyCloak instance (it
> won't be a "standard" part of Airflow).
> 3) If the user runs on AWS/Azure/GCP/others - each cloud would
> (hopefully) develop their own provider to integrate with IAM etc - >
> they could use that provider directly. Or they could use and manage
> their KeyCloak in the cloud as they see fit (it supports all the
> clouds Oauth integration). Or develop their own provider.
> 4) Those on managed services will have no choice but to use the
> provider installed by the Service of theirs
>
> I think that all gives the user the choice - if they want to go role
> management and multi-tenant capabilities, fine but they will have to
> mange the users outside of Airflow and integrate Airflow with it (and
> they can either integrate with what they have already or use
> KeyCloak). And does not really impair them.
>
> J,
>
>
> On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham
> <shubhx@amazon.com.inva <ma...@amazon.com.inva>lid> wrote:
> >
> > Thanks, Kaxil – that helped to clarify the proposal a bit more.
> >
> > > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based)
> >
> > Are you suggesting that we build this resource-driven security model directly into Airflow, without relying on external dependencies like FAB?
> >
> > > Extend this to the other Airflow components (scheduler, workers, triggered, cli)
> >
> > Are there cases where the scheduler or CLI would require the authorization API? Since they are considered trusted components, I assumed they would not need it.
> >
> >
> > Jarek - as always, I appreciate you sharing your thoughts and having an open discussion.
> >
> > > Which really explains what "Airflow as a Platform" is all about. I do not think we already know all the parts that should be converted into "Airflow extendability". It's more of an incremental effort like that where we have those bright ideas "Hey - this part can be removed and delegated to others". I think this has never been formulated explicitly but I think for quite a while we are really in the mode where we think much more about what we can SPLIT OUT from Airflow rather than what we can ADD to Airflow.
> >
> > Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
> >
> > I am still unclear about other user scenarios related to user management, besides multi-tenancy, that Airflow customers are looking to enable. While the extensibility we aim for will enable this, is there a need for it? Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested in building a custom user management provider that works with your platform? Have there been cases where your customers were limited by the current permissioning model, and you considered replacing FAB?
> >
> > I believe that the primary motivation for "user management provider" is driven by the excitement around getting rid of FAB, which I think we can still achieve while including multi-tenancy in the core Airflow. Both should be treated as separate problems.
> >
> > References:
> > 1. https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice <https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice>
> >
> > On 2023-02-14, 12:44 PM, "Jarek Potiuk" <jarek@potiuk.com <ma...@potiuk.com>> wrote:
> >
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > Comment to Subham's question:
> >
> > > In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
> >
> > I am glad you asked. I think, this is one of the what I wanted to
> > achieve by adding this page
> > https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst <https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst>
> > - it will be live in 2.6 and one of the main parts is this one:
> >
> > https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities <https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities>
> >
> > Which really explains what "Airflow as a Platform" is all about. I do
> > not think we already know all the parts that should be converted into
> > "Airflow extendability". It's more of an incremental effort like that
> > where we have those bright ideas "Hey - this part can be removed and
> > delegated to others". I think this has never been formulated
> > explicitly but I think for quite a while we are really in the mode
> > where we think much more about what we can SPLIT OUT from Airflow
> > rather than what we can ADD to Airflow.
> >
> > When you look at it, this is also the main idea behind Open Lineage
> > integration for example - we are adding open linage (which is really
> > just an API) so that others can build "everything-lineage" on top of
> > it. So we are adding a minimum-possible set of APIs and integration so
> > that we can expose the lineage capability so that all the lineage "UI"
> > and other use cases that lineage exposes would be done outside. We are
> > in a strong position to do it - being sure that when we expose it,
> > others will implement the integration they care about.
> >
> > I think more and more (and It has been preached by Ash mostly, but
> > also others) that we should be focusing solely on being an extremely
> > powerful and robust scheduler and make sure we are exposing all of the
> > possible things that can be exposed as an external API (while still
> > providing basic implementation that makes airflow still a "finished"
> > product that can be used to handle basic cases.
> >
> > BTW. We are now preparing for the Airflow Summit CFP (some
> > announcements will follow shortly, I do not want to spill too many
> > beans) and we have a very interesting broad category "Airflow and
> > ...." . And I think we should work in the direction that the `...` is
> > far bigger than Airflow itself.
> >
> > J.
> >
> > On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <kaxilnaik@gmail.com <ma...@gmail.com>> wrote:
> > >
> > > Great idea Vikram, I love the idea of making this a provider/pluggable.
> > >
> > > In some ways, we already have a pluggable mechanism for Authentication with Auth Backends [1]. Where we will need lot more work I think is:
> > >
> > > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based) [2]
> > > Extend this to the other Airflow components (scheduler, workers, triggered, cli) or make them all driven by a single API that takes care of Auth. This will also reduce a lot of duplication of code across many of the components
> > > For backwards compact, we could ship with FAB-provider that still uses Flask-app builder in addition to our recommended provider that will have more features and users/companies/stabkeholders can build on top of that provider to extend it further.
> > >
> > >
> > > References:
> > > [1]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends <https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends>
> > > [2]: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html <https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html>
> > >
> > > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <shubhx@amazon.com.inva <ma...@amazon.com.inva>lid> wrote:
> > >>
> > >> Hi Vikram,
> > >> Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.
> > >>
> > >> In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
> > >> 1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?
> > >> 2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
> > >> 3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
> > >> 4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
> > >> 5. What will the migration process look like for existing users to this non-FAB pluggable model?
> > >>
> > >> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
> > >>
> > >> Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.
> > >>
> > >> Regards
> > >> Shubham
> > >>
> > >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <jarek@potiuk.com <ma...@potiuk.com>> wrote:
> > >>
> > >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >>
> > >>
> > >>
> > >> Hey Vikram,
> > >>
> > >> I think it's brilliant and I wonder how it happened that had not
> > >> occurred to us earlier. And I believe that is due to the natural
> > >> tendency of "following as we always did" rather than thinking
> > >> completely out-of-the-box. Thanks Vikram for bringing it up.
> > >>
> > >> The funny thing is that when I see this:
> > >>
> > >> > However, I don't agree that this level of user management belongs in "Core Airflow".
> > >>
> > >> I almost immediately think - NOOOOO, why, it's always been here, how
> > >> can we remove it?
> > >>
> > >> But then if you look a bit closer:
> > >>
> > >> > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.
> > >>
> > >> Then it starts to make way more sense. Way more.
> > >>
> > >> And when you look further:
> > >>
> > >> > Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
> > >>
> > >> My heart jumps and I am immediately sold on the idea.
> > >>
> > >> When I was commenting on the doc initially, something was not right.
> > >> I had a feeling It is probably the 5th time I am looking and
> > >> commenting on a similar document. And, well, I did, actually. Most of
> > >> the things we discussed there are already implemented out there. We
> > >> just need to make sure we expose enough of the API to use them. For
> > >> example we have Keycloak that is an open source implementation of
> > >> Identity and Access Management. With everything out there already
> > >> integrated. and I've been part of the project that integrated just the
> > >> authentication part. Now if we rethink the authorization and make it
> > >> simpler and "externally driven", this will not only be faster IMHO,
> > >> but also will allow enterprise users to integrate much better.
> > >>
> > >> I believe following the path that Vikram outlined will be a good
> > >> direction for everyone in the community - including all the Manage
> > >> Service providers, who will have a far easier job on integrating
> > >> Airflow into their authentication models.
> > >>
> > >> J.
> > >>
> > >>
> > >>
> > >> On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
> > >> <vikram@astronomer.io.inva <ma...@astronomer.io.inva>lid> wrote:
> > >> >
> > >> > Shubham and Vincent,
> > >> >
> > >> > Let me start by saying that I apologize for my delayed response to your original email.
> > >> >
> > >> > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
> > >> >
> > >> > However, I don't agree that this level of user management belongs in "Core Airflow".
> > >> >
> > >> > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
> > >> >
> > >> > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
> > >> >
> > >> > Best regards,
> > >> > Vikram
> > >> >
> > >> >
> > >> > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vincbeck@amazon.com.inva <ma...@amazon.com.inva>lid> wrote:
> > >> >>
> > >> >> Thanks __
> > >> >>
> > >> >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <jarek@potiuk.com <ma...@potiuk.com>> wrote:
> > >> >>
> > >> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Added.
> > >> >>
> > >> >> On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
> > >> >> <vincbeck@amazon.com.inva <ma...@amazon.com.inva>lid> wrote:
> > >> >> >
> > >> >> > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck <https://cwiki.apache.org/confluence/display/~vin100.beck>
> > >> >> >
> > >> >> > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <jarek@potiuk.com <ma...@potiuk.com>> wrote:
> > >> >> >
> > >> >> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > What's your cwiki ID, Vincent (I'll add you without going into details yet)
> > >> >> >
> > >> >>
> > >>
> >






Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Mehta, Shubham" <sh...@amazon.com.INVALID>.
@Jarek - thank you for your initial deep dive on Keycloak. It looks very promising and is likely the open-source provider we should adopt for Multi-tenancy support. We can decide this at a later stage once we finish step 1.

>1) For existing users/those who want to keep all "in-airflow-ui" they could use FAB Provider (which will be separated from the Core). Same as today, but without the advanced management features for groups and tenants. We might consider dropping that altogether eventually.
As the next step, we will do deep dive on separating the FAB provider and designing the Airflow Authorization API. We will share our findings with the community as a GitHub discussion and may even do a PoC if necessary.

@Community, if you are an expert in authorization or FAB and interested in collaborating on this effort, please contact me or @Beck, Vincent (here or on Airflow Slack). We will be happy to work together to make Multi-tenancy in Airflow a reality.

Shubham

On 2023-02-17, 5:45 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    And to Kaxil's mail: yep. What you wrote is exactly what I understood
    needs to be done.

    On Fri, Feb 17, 2023 at 2:40 PM Jarek Potiuk <ja...@potiuk.com> wrote:
    >
    > > Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
    >
    > I do not think this will happen. I think part of the effort should not
    > only implement the API but also to provide a fully fledged (though
    > simple) implementation of such a provider which works with an
    > open-source implementation of identity - KeyCloak is one that comes to
    > my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak
    > as reference provider we can release", but I think KeyCloak has all we
    > need:
    > * integration with mutliple authentication providers and protocols
    > * User Management:
    > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html
    > * Role Mangement including user mapping:
    > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html
    > * Group management:
    > https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html
    >
    > It comes with a management console, CLI and much more
    > (auditing/session management etc. etc.)
    >
    > In a way it would be simply providing very much the same what FAB
    > Security Manager does, but with much more complete scope and - most
    > importantly - it would not be "part of Airflow as FAB is", it would be
    > "outside" of it and the only thing Airflow would provide is merely
    > pointers to the Docs of Keycloak on how to integrate it with Airflow
    > as a proxy: https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html
    > (or it could be done by writing Airflow KeyCloak Adapter - to be
    > decided what would be easier to maintain).The users will be free to
    > configure KeyCloak proxy as they see fit. No DB needed in Airflow to
    > manage any of those, no UI, no API, no CLI - all that delegated out
    > and integrated via incoming headers or adapter.
    >
    > The users will have several choices:
    >
    > 1) For existing users/those who want to keep all "in-airflow-ui"  they
    > could use FAB Provider (which will be separated from the Core). Same
    > as today, but without the advanced management features for groups and
    > tenants. We might consider dropping that altogether eventually.
    > 2) If they are on premise - they can use KeyCloak Provider - by
    > following our advice/suggestions/simple guidelines on how to
    > integrate. They would have to manage their own KeyCloak instance (it
    > won't be a "standard" part of Airflow).
    > 3) If the user runs on AWS/Azure/GCP/others - each cloud  would
    > (hopefully) develop their own provider to integrate with IAM etc - >
    > they could use that provider directly. Or they could use and manage
    > their KeyCloak in the cloud as they see fit (it supports all the
    > clouds Oauth integration). Or develop their own provider.
    > 4) Those on managed services will have no choice but to use the
    > provider installed by the Service of theirs
    >
    > I think that all gives the user the choice - if they want to go role
    > management and multi-tenant capabilities, fine but they will have to
    > mange the users outside of Airflow and integrate Airflow with it (and
    > they can either integrate with what they have already or use
    > KeyCloak). And does not really impair them.
    >
    > J,
    >
    >
    > On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham
    > <sh...@amazon.com.invalid> wrote:
    > >
    > > Thanks, Kaxil – that helped to clarify the proposal a bit more.
    > >
    > > > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based)
    > >
    > > Are you suggesting that we build this resource-driven security model directly into Airflow, without relying on external dependencies like FAB?
    > >
    > > > Extend this to the other Airflow components (scheduler, workers, triggered, cli)
    > >
    > > Are there cases where the scheduler or CLI would require the authorization API? Since they are considered trusted components, I assumed they would not need it.
    > >
    > >
    > > Jarek - as always, I appreciate you sharing your thoughts and having an open discussion.
    > >
    > > > Which really explains what "Airflow as a Platform" is all about. I do not think we already know all the parts that should be converted into "Airflow extendability". It's more of an incremental effort like that where we have those bright ideas "Hey - this part can be removed and delegated to others".  I think this has never been formulated explicitly but I think for quite a while we are really in the mode where we think much more about what we can SPLIT OUT from Airflow rather than what we can ADD to Airflow.
    > >
    > > Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
    > >
    > > I am still unclear about other user scenarios related to user management, besides multi-tenancy, that Airflow customers are looking to enable. While the extensibility we aim for will enable this, is there a need for it? Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested in building a custom user management provider that works with your platform? Have there been cases where your customers were limited by the current permissioning model, and you considered replacing FAB?
    > >
    > > I believe that the primary motivation for "user management provider" is driven by the excitement around getting rid of FAB, which I think we can still achieve while including multi-tenancy in the core Airflow. Both should be treated as separate problems.
    > >
    > > References:
    > > 1. https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice
    > >
    > > On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    > >
    > >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    > >
    > >
    > >
    > >     Comment to Subham's question:
    > >
    > >     > In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
    > >
    > >     I am glad you asked. I think, this is one of the  what I wanted to
    > >     achieve by adding this page
    > >     https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
    > >     - it will be live in 2.6 and one of the main parts is this one:
    > >
    > >     https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities
    > >
    > >     Which really explains what "Airflow as a Platform" is all about. I do
    > >     not think we already know all the parts that should be converted into
    > >     "Airflow extendability". It's more of an incremental effort like that
    > >     where we have those bright ideas "Hey - this part can be removed and
    > >     delegated to others".  I think this has never been formulated
    > >     explicitly but I think for quite a while we are really in the mode
    > >     where we think much more about what we can SPLIT OUT from Airflow
    > >     rather than what we can ADD to Airflow.
    > >
    > >     When you look at it, this is also the main idea behind Open Lineage
    > >     integration for example - we are adding open linage (which is really
    > >     just an API) so that others can build "everything-lineage" on top of
    > >     it. So we are adding a minimum-possible set of APIs and integration so
    > >     that we can expose the lineage capability so that all the lineage "UI"
    > >     and other use cases that lineage exposes would be done outside. We are
    > >     in a strong position to do it - being sure that when we expose it,
    > >     others will implement the integration they care about.
    > >
    > >     I think more and more (and It has been preached by Ash mostly, but
    > >     also others) that we should be focusing solely on being an extremely
    > >     powerful and robust scheduler and make sure we are exposing all of the
    > >     possible things that can be exposed as an external API (while still
    > >     providing basic implementation that makes airflow still a "finished"
    > >     product that can be used to handle basic cases.
    > >
    > >     BTW. We are now preparing for the Airflow Summit CFP (some
    > >     announcements will follow shortly, I do not want to spill too many
    > >     beans) and we have a very interesting broad category "Airflow and
    > >     ...." . And I think we should work in the direction that the `...` is
    > >     far bigger than Airflow itself.
    > >
    > >     J.
    > >
    > >     On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <ka...@gmail.com> wrote:
    > >     >
    > >     > Great idea Vikram, I love the idea of making this a provider/pluggable.
    > >     >
    > >     > In some ways, we already have a pluggable mechanism for Authentication with Auth Backends [1]. Where we will need lot more work I think is:
    > >     >
    > >     > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based) [2]
    > >     > Extend this to the other Airflow components (scheduler, workers, triggered, cli) or make them all driven by a single API that takes care of Auth. This will also reduce a lot of duplication of code across many of the components
    > >     > For backwards compact, we could ship with FAB-provider that still uses Flask-app builder in addition to our recommended provider that will have more features and users/companies/stabkeholders can build on top of that provider to extend it further.
    > >     >
    > >     >
    > >     > References:
    > >     > [1]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
    > >     > [2]: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
    > >     >
    > >     > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <sh...@amazon.com.invalid> wrote:
    > >     >>
    > >     >> Hi Vikram,
    > >     >> Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.
    > >     >>
    > >     >> In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
    > >     >> 1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?
    > >     >> 2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
    > >     >> 3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
    > >     >> 4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
    > >     >> 5. What will the migration process look like for existing users to this non-FAB pluggable model?
    > >     >>
    > >     >> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
    > >     >>
    > >     >> Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.
    > >     >>
    > >     >> Regards
    > >     >> Shubham
    > >     >>
    > >     >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    > >     >>
    > >     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    > >     >>
    > >     >>
    > >     >>
    > >     >>     Hey Vikram,
    > >     >>
    > >     >>     I think it's brilliant and I wonder how it happened that had not
    > >     >>     occurred to us earlier. And I believe that is due to the natural
    > >     >>     tendency of "following as we always did" rather than thinking
    > >     >>     completely out-of-the-box. Thanks Vikram for bringing it up.
    > >     >>
    > >     >>     The funny thing is that when I see this:
    > >     >>
    > >     >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
    > >     >>
    > >     >>     I almost immediately think - NOOOOO, why, it's always been here, how
    > >     >>     can we remove it?
    > >     >>
    > >     >>     But then if you look a bit closer:
    > >     >>
    > >     >>     > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.
    > >     >>
    > >     >>     Then it starts to make way more sense. Way more.
    > >     >>
    > >     >>     And when you look further:
    > >     >>
    > >     >>     >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
    > >     >>
    > >     >>     My heart jumps and I am immediately sold on the idea.
    > >     >>
    > >     >>     When I was commenting on the doc  initially, something was not right.
    > >     >>     I had a feeling It is probably the 5th time I am looking and
    > >     >>     commenting on a similar document. And, well, I did, actually. Most of
    > >     >>     the things we discussed there are already implemented out there. We
    > >     >>     just need to make sure we expose enough of the API to use them. For
    > >     >>     example we have Keycloak that is an open source implementation of
    > >     >>     Identity and Access Management. With everything out there already
    > >     >>     integrated. and I've been part of the project that integrated just the
    > >     >>     authentication part. Now if we rethink the authorization and make it
    > >     >>     simpler and "externally driven", this will not only be faster IMHO,
    > >     >>     but also will allow enterprise users to integrate much better.
    > >     >>
    > >     >>     I believe following the path that Vikram outlined will be a good
    > >     >>     direction for everyone in the community - including all the Manage
    > >     >>     Service providers, who will have a far easier job on integrating
    > >     >>     Airflow into their authentication models.
    > >     >>
    > >     >>     J.
    > >     >>
    > >     >>
    > >     >>
    > >     >>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
    > >     >>     <vi...@astronomer.io.invalid> wrote:
    > >     >>     >
    > >     >>     > Shubham and Vincent,
    > >     >>     >
    > >     >>     > Let me start by saying that I apologize for my delayed response to your original email.
    > >     >>     >
    > >     >>     > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
    > >     >>     >
    > >     >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
    > >     >>     >
    > >     >>     > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
    > >     >>     >
    > >     >>     > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
    > >     >>     >
    > >     >>     > Best regards,
    > >     >>     > Vikram
    > >     >>     >
    > >     >>     >
    > >     >>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
    > >     >>     >>
    > >     >>     >> Thanks __
    > >     >>     >>
    > >     >>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    > >     >>     >>
    > >     >>     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    > >     >>     >>
    > >     >>     >>
    > >     >>     >>
    > >     >>     >>     Added.
    > >     >>     >>
    > >     >>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
    > >     >>     >>     <vi...@amazon.com.invalid> wrote:
    > >     >>     >>     >
    > >     >>     >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
    > >     >>     >>     >
    > >     >>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    > >     >>     >>     >
    > >     >>     >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    > >     >>     >>     >
    > >     >>     >>     >
    > >     >>     >>     >
    > >     >>     >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
    > >     >>     >>     >
    > >     >>     >>
    > >     >>
    > >


Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Jarek Potiuk <ja...@potiuk.com>.
And to Kaxil's mail: yep. What you wrote is exactly what I understood
needs to be done.

On Fri, Feb 17, 2023 at 2:40 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
>
> I do not think this will happen. I think part of the effort should not
> only implement the API but also to provide a fully fledged (though
> simple) implementation of such a provider which works with an
> open-source implementation of identity - KeyCloak is one that comes to
> my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak
> as reference provider we can release", but I think KeyCloak has all we
> need:
> * integration with mutliple authentication providers and protocols
> * User Management:
> https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html
> * Role Mangement including user mapping:
> https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html
> * Group management:
> https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html
>
> It comes with a management console, CLI and much more
> (auditing/session management etc. etc.)
>
> In a way it would be simply providing very much the same what FAB
> Security Manager does, but with much more complete scope and - most
> importantly - it would not be "part of Airflow as FAB is", it would be
> "outside" of it and the only thing Airflow would provide is merely
> pointers to the Docs of Keycloak on how to integrate it with Airflow
> as a proxy: https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html
> (or it could be done by writing Airflow KeyCloak Adapter - to be
> decided what would be easier to maintain).The users will be free to
> configure KeyCloak proxy as they see fit. No DB needed in Airflow to
> manage any of those, no UI, no API, no CLI - all that delegated out
> and integrated via incoming headers or adapter.
>
> The users will have several choices:
>
> 1) For existing users/those who want to keep all "in-airflow-ui"  they
> could use FAB Provider (which will be separated from the Core). Same
> as today, but without the advanced management features for groups and
> tenants. We might consider dropping that altogether eventually.
> 2) If they are on premise - they can use KeyCloak Provider - by
> following our advice/suggestions/simple guidelines on how to
> integrate. They would have to manage their own KeyCloak instance (it
> won't be a "standard" part of Airflow).
> 3) If the user runs on AWS/Azure/GCP/others - each cloud  would
> (hopefully) develop their own provider to integrate with IAM etc - >
> they could use that provider directly. Or they could use and manage
> their KeyCloak in the cloud as they see fit (it supports all the
> clouds Oauth integration). Or develop their own provider.
> 4) Those on managed services will have no choice but to use the
> provider installed by the Service of theirs
>
> I think that all gives the user the choice - if they want to go role
> management and multi-tenant capabilities, fine but they will have to
> mange the users outside of Airflow and integrate Airflow with it (and
> they can either integrate with what they have already or use
> KeyCloak). And does not really impair them.
>
> J,
>
>
> On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham
> <sh...@amazon.com.invalid> wrote:
> >
> > Thanks, Kaxil – that helped to clarify the proposal a bit more.
> >
> > > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based)
> >
> > Are you suggesting that we build this resource-driven security model directly into Airflow, without relying on external dependencies like FAB?
> >
> > > Extend this to the other Airflow components (scheduler, workers, triggered, cli)
> >
> > Are there cases where the scheduler or CLI would require the authorization API? Since they are considered trusted components, I assumed they would not need it.
> >
> >
> > Jarek - as always, I appreciate you sharing your thoughts and having an open discussion.
> >
> > > Which really explains what "Airflow as a Platform" is all about. I do not think we already know all the parts that should be converted into "Airflow extendability". It's more of an incremental effort like that where we have those bright ideas "Hey - this part can be removed and delegated to others".  I think this has never been formulated explicitly but I think for quite a while we are really in the mode where we think much more about what we can SPLIT OUT from Airflow rather than what we can ADD to Airflow.
> >
> > Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
> >
> > I am still unclear about other user scenarios related to user management, besides multi-tenancy, that Airflow customers are looking to enable. While the extensibility we aim for will enable this, is there a need for it? Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested in building a custom user management provider that works with your platform? Have there been cases where your customers were limited by the current permissioning model, and you considered replacing FAB?
> >
> > I believe that the primary motivation for "user management provider" is driven by the excitement around getting rid of FAB, which I think we can still achieve while including multi-tenancy in the core Airflow. Both should be treated as separate problems.
> >
> > References:
> > 1. https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice
> >
> > On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
> >
> >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> >     Comment to Subham's question:
> >
> >     > In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
> >
> >     I am glad you asked. I think, this is one of the  what I wanted to
> >     achieve by adding this page
> >     https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
> >     - it will be live in 2.6 and one of the main parts is this one:
> >
> >     https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities
> >
> >     Which really explains what "Airflow as a Platform" is all about. I do
> >     not think we already know all the parts that should be converted into
> >     "Airflow extendability". It's more of an incremental effort like that
> >     where we have those bright ideas "Hey - this part can be removed and
> >     delegated to others".  I think this has never been formulated
> >     explicitly but I think for quite a while we are really in the mode
> >     where we think much more about what we can SPLIT OUT from Airflow
> >     rather than what we can ADD to Airflow.
> >
> >     When you look at it, this is also the main idea behind Open Lineage
> >     integration for example - we are adding open linage (which is really
> >     just an API) so that others can build "everything-lineage" on top of
> >     it. So we are adding a minimum-possible set of APIs and integration so
> >     that we can expose the lineage capability so that all the lineage "UI"
> >     and other use cases that lineage exposes would be done outside. We are
> >     in a strong position to do it - being sure that when we expose it,
> >     others will implement the integration they care about.
> >
> >     I think more and more (and It has been preached by Ash mostly, but
> >     also others) that we should be focusing solely on being an extremely
> >     powerful and robust scheduler and make sure we are exposing all of the
> >     possible things that can be exposed as an external API (while still
> >     providing basic implementation that makes airflow still a "finished"
> >     product that can be used to handle basic cases.
> >
> >     BTW. We are now preparing for the Airflow Summit CFP (some
> >     announcements will follow shortly, I do not want to spill too many
> >     beans) and we have a very interesting broad category "Airflow and
> >     ...." . And I think we should work in the direction that the `...` is
> >     far bigger than Airflow itself.
> >
> >     J.
> >
> >     On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <ka...@gmail.com> wrote:
> >     >
> >     > Great idea Vikram, I love the idea of making this a provider/pluggable.
> >     >
> >     > In some ways, we already have a pluggable mechanism for Authentication with Auth Backends [1]. Where we will need lot more work I think is:
> >     >
> >     > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based) [2]
> >     > Extend this to the other Airflow components (scheduler, workers, triggered, cli) or make them all driven by a single API that takes care of Auth. This will also reduce a lot of duplication of code across many of the components
> >     > For backwards compact, we could ship with FAB-provider that still uses Flask-app builder in addition to our recommended provider that will have more features and users/companies/stabkeholders can build on top of that provider to extend it further.
> >     >
> >     >
> >     > References:
> >     > [1]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
> >     > [2]: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
> >     >
> >     > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <sh...@amazon.com.invalid> wrote:
> >     >>
> >     >> Hi Vikram,
> >     >> Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.
> >     >>
> >     >> In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
> >     >> 1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?
> >     >> 2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
> >     >> 3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
> >     >> 4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
> >     >> 5. What will the migration process look like for existing users to this non-FAB pluggable model?
> >     >>
> >     >> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
> >     >>
> >     >> Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.
> >     >>
> >     >> Regards
> >     >> Shubham
> >     >>
> >     >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
> >     >>
> >     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >     >>
> >     >>
> >     >>
> >     >>     Hey Vikram,
> >     >>
> >     >>     I think it's brilliant and I wonder how it happened that had not
> >     >>     occurred to us earlier. And I believe that is due to the natural
> >     >>     tendency of "following as we always did" rather than thinking
> >     >>     completely out-of-the-box. Thanks Vikram for bringing it up.
> >     >>
> >     >>     The funny thing is that when I see this:
> >     >>
> >     >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
> >     >>
> >     >>     I almost immediately think - NOOOOO, why, it's always been here, how
> >     >>     can we remove it?
> >     >>
> >     >>     But then if you look a bit closer:
> >     >>
> >     >>     > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.
> >     >>
> >     >>     Then it starts to make way more sense. Way more.
> >     >>
> >     >>     And when you look further:
> >     >>
> >     >>     >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
> >     >>
> >     >>     My heart jumps and I am immediately sold on the idea.
> >     >>
> >     >>     When I was commenting on the doc  initially, something was not right.
> >     >>     I had a feeling It is probably the 5th time I am looking and
> >     >>     commenting on a similar document. And, well, I did, actually. Most of
> >     >>     the things we discussed there are already implemented out there. We
> >     >>     just need to make sure we expose enough of the API to use them. For
> >     >>     example we have Keycloak that is an open source implementation of
> >     >>     Identity and Access Management. With everything out there already
> >     >>     integrated. and I've been part of the project that integrated just the
> >     >>     authentication part. Now if we rethink the authorization and make it
> >     >>     simpler and "externally driven", this will not only be faster IMHO,
> >     >>     but also will allow enterprise users to integrate much better.
> >     >>
> >     >>     I believe following the path that Vikram outlined will be a good
> >     >>     direction for everyone in the community - including all the Manage
> >     >>     Service providers, who will have a far easier job on integrating
> >     >>     Airflow into their authentication models.
> >     >>
> >     >>     J.
> >     >>
> >     >>
> >     >>
> >     >>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
> >     >>     <vi...@astronomer.io.invalid> wrote:
> >     >>     >
> >     >>     > Shubham and Vincent,
> >     >>     >
> >     >>     > Let me start by saying that I apologize for my delayed response to your original email.
> >     >>     >
> >     >>     > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
> >     >>     >
> >     >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
> >     >>     >
> >     >>     > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
> >     >>     >
> >     >>     > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
> >     >>     >
> >     >>     > Best regards,
> >     >>     > Vikram
> >     >>     >
> >     >>     >
> >     >>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
> >     >>     >>
> >     >>     >> Thanks __
> >     >>     >>
> >     >>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
> >     >>     >>
> >     >>     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >     >>     >>
> >     >>     >>
> >     >>     >>
> >     >>     >>     Added.
> >     >>     >>
> >     >>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
> >     >>     >>     <vi...@amazon.com.invalid> wrote:
> >     >>     >>     >
> >     >>     >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
> >     >>     >>     >
> >     >>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
> >     >>     >>     >
> >     >>     >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >     >>     >>     >
> >     >>     >>     >
> >     >>     >>     >
> >     >>     >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
> >     >>     >>     >
> >     >>     >>
> >     >>
> >

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Jarek Potiuk <ja...@potiuk.com>.
> Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.

I do not think this will happen. I think part of the effort should not
only implement the API but also to provide a fully fledged (though
simple) implementation of such a provider which works with an
open-source implementation of identity - KeyCloak is one that comes to
my mind. It's possibly jumping ahead a bit to say "let's use KeyCloak
as reference provider we can release", but I think KeyCloak has all we
need:
* integration with mutliple authentication providers and protocols
* User Management:
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/users/viewing.html
* Role Mangement including user mapping:
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/roles/user-role-mappings.html
* Group management:
https://wjw465150.gitbooks.io/keycloak-documentation/content/server_admin/topics/groups/groups-vs-roles.html

It comes with a management console, CLI and much more
(auditing/session management etc. etc.)

In a way it would be simply providing very much the same what FAB
Security Manager does, but with much more complete scope and - most
importantly - it would not be "part of Airflow as FAB is", it would be
"outside" of it and the only thing Airflow would provide is merely
pointers to the Docs of Keycloak on how to integrate it with Airflow
as a proxy: https://wjw465150.gitbooks.io/keycloak-documentation/content/server_installation/topics/proxy.html
(or it could be done by writing Airflow KeyCloak Adapter - to be
decided what would be easier to maintain).The users will be free to
configure KeyCloak proxy as they see fit. No DB needed in Airflow to
manage any of those, no UI, no API, no CLI - all that delegated out
and integrated via incoming headers or adapter.

The users will have several choices:

1) For existing users/those who want to keep all "in-airflow-ui"  they
could use FAB Provider (which will be separated from the Core). Same
as today, but without the advanced management features for groups and
tenants. We might consider dropping that altogether eventually.
2) If they are on premise - they can use KeyCloak Provider - by
following our advice/suggestions/simple guidelines on how to
integrate. They would have to manage their own KeyCloak instance (it
won't be a "standard" part of Airflow).
3) If the user runs on AWS/Azure/GCP/others - each cloud  would
(hopefully) develop their own provider to integrate with IAM etc - >
they could use that provider directly. Or they could use and manage
their KeyCloak in the cloud as they see fit (it supports all the
clouds Oauth integration). Or develop their own provider.
4) Those on managed services will have no choice but to use the
provider installed by the Service of theirs

I think that all gives the user the choice - if they want to go role
management and multi-tenant capabilities, fine but they will have to
mange the users outside of Airflow and integrate Airflow with it (and
they can either integrate with what they have already or use
KeyCloak). And does not really impair them.

J,


On Thu, Feb 16, 2023 at 6:27 AM Mehta, Shubham
<sh...@amazon.com.invalid> wrote:
>
> Thanks, Kaxil – that helped to clarify the proposal a bit more.
>
> > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based)
>
> Are you suggesting that we build this resource-driven security model directly into Airflow, without relying on external dependencies like FAB?
>
> > Extend this to the other Airflow components (scheduler, workers, triggered, cli)
>
> Are there cases where the scheduler or CLI would require the authorization API? Since they are considered trusted components, I assumed they would not need it.
>
>
> Jarek - as always, I appreciate you sharing your thoughts and having an open discussion.
>
> > Which really explains what "Airflow as a Platform" is all about. I do not think we already know all the parts that should be converted into "Airflow extendability". It's more of an incremental effort like that where we have those bright ideas "Hey - this part can be removed and delegated to others".  I think this has never been formulated explicitly but I think for quite a while we are really in the mode where we think much more about what we can SPLIT OUT from Airflow rather than what we can ADD to Airflow.
>
> Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.
>
> I am still unclear about other user scenarios related to user management, besides multi-tenancy, that Airflow customers are looking to enable. While the extensibility we aim for will enable this, is there a need for it? Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested in building a custom user management provider that works with your platform? Have there been cases where your customers were limited by the current permissioning model, and you considered replacing FAB?
>
> I believe that the primary motivation for "user management provider" is driven by the excitement around getting rid of FAB, which I think we can still achieve while including multi-tenancy in the core Airflow. Both should be treated as separate problems.
>
> References:
> 1. https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice
>
> On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>     Comment to Subham's question:
>
>     > In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
>
>     I am glad you asked. I think, this is one of the  what I wanted to
>     achieve by adding this page
>     https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
>     - it will be live in 2.6 and one of the main parts is this one:
>
>     https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities
>
>     Which really explains what "Airflow as a Platform" is all about. I do
>     not think we already know all the parts that should be converted into
>     "Airflow extendability". It's more of an incremental effort like that
>     where we have those bright ideas "Hey - this part can be removed and
>     delegated to others".  I think this has never been formulated
>     explicitly but I think for quite a while we are really in the mode
>     where we think much more about what we can SPLIT OUT from Airflow
>     rather than what we can ADD to Airflow.
>
>     When you look at it, this is also the main idea behind Open Lineage
>     integration for example - we are adding open linage (which is really
>     just an API) so that others can build "everything-lineage" on top of
>     it. So we are adding a minimum-possible set of APIs and integration so
>     that we can expose the lineage capability so that all the lineage "UI"
>     and other use cases that lineage exposes would be done outside. We are
>     in a strong position to do it - being sure that when we expose it,
>     others will implement the integration they care about.
>
>     I think more and more (and It has been preached by Ash mostly, but
>     also others) that we should be focusing solely on being an extremely
>     powerful and robust scheduler and make sure we are exposing all of the
>     possible things that can be exposed as an external API (while still
>     providing basic implementation that makes airflow still a "finished"
>     product that can be used to handle basic cases.
>
>     BTW. We are now preparing for the Airflow Summit CFP (some
>     announcements will follow shortly, I do not want to spill too many
>     beans) and we have a very interesting broad category "Airflow and
>     ...." . And I think we should work in the direction that the `...` is
>     far bigger than Airflow itself.
>
>     J.
>
>     On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <ka...@gmail.com> wrote:
>     >
>     > Great idea Vikram, I love the idea of making this a provider/pluggable.
>     >
>     > In some ways, we already have a pluggable mechanism for Authentication with Auth Backends [1]. Where we will need lot more work I think is:
>     >
>     > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based) [2]
>     > Extend this to the other Airflow components (scheduler, workers, triggered, cli) or make them all driven by a single API that takes care of Auth. This will also reduce a lot of duplication of code across many of the components
>     > For backwards compact, we could ship with FAB-provider that still uses Flask-app builder in addition to our recommended provider that will have more features and users/companies/stabkeholders can build on top of that provider to extend it further.
>     >
>     >
>     > References:
>     > [1]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
>     > [2]: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
>     >
>     > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <sh...@amazon.com.invalid> wrote:
>     >>
>     >> Hi Vikram,
>     >> Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.
>     >>
>     >> In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
>     >> 1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?
>     >> 2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
>     >> 3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
>     >> 4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
>     >> 5. What will the migration process look like for existing users to this non-FAB pluggable model?
>     >>
>     >> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
>     >>
>     >> Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.
>     >>
>     >> Regards
>     >> Shubham
>     >>
>     >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>     >>
>     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>     >>
>     >>
>     >>
>     >>     Hey Vikram,
>     >>
>     >>     I think it's brilliant and I wonder how it happened that had not
>     >>     occurred to us earlier. And I believe that is due to the natural
>     >>     tendency of "following as we always did" rather than thinking
>     >>     completely out-of-the-box. Thanks Vikram for bringing it up.
>     >>
>     >>     The funny thing is that when I see this:
>     >>
>     >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
>     >>
>     >>     I almost immediately think - NOOOOO, why, it's always been here, how
>     >>     can we remove it?
>     >>
>     >>     But then if you look a bit closer:
>     >>
>     >>     > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.
>     >>
>     >>     Then it starts to make way more sense. Way more.
>     >>
>     >>     And when you look further:
>     >>
>     >>     >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
>     >>
>     >>     My heart jumps and I am immediately sold on the idea.
>     >>
>     >>     When I was commenting on the doc  initially, something was not right.
>     >>     I had a feeling It is probably the 5th time I am looking and
>     >>     commenting on a similar document. And, well, I did, actually. Most of
>     >>     the things we discussed there are already implemented out there. We
>     >>     just need to make sure we expose enough of the API to use them. For
>     >>     example we have Keycloak that is an open source implementation of
>     >>     Identity and Access Management. With everything out there already
>     >>     integrated. and I've been part of the project that integrated just the
>     >>     authentication part. Now if we rethink the authorization and make it
>     >>     simpler and "externally driven", this will not only be faster IMHO,
>     >>     but also will allow enterprise users to integrate much better.
>     >>
>     >>     I believe following the path that Vikram outlined will be a good
>     >>     direction for everyone in the community - including all the Manage
>     >>     Service providers, who will have a far easier job on integrating
>     >>     Airflow into their authentication models.
>     >>
>     >>     J.
>     >>
>     >>
>     >>
>     >>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
>     >>     <vi...@astronomer.io.invalid> wrote:
>     >>     >
>     >>     > Shubham and Vincent,
>     >>     >
>     >>     > Let me start by saying that I apologize for my delayed response to your original email.
>     >>     >
>     >>     > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
>     >>     >
>     >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
>     >>     >
>     >>     > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
>     >>     >
>     >>     > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
>     >>     >
>     >>     > Best regards,
>     >>     > Vikram
>     >>     >
>     >>     >
>     >>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
>     >>     >>
>     >>     >> Thanks __
>     >>     >>
>     >>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>     >>     >>
>     >>     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>     >>     >>
>     >>     >>
>     >>     >>
>     >>     >>     Added.
>     >>     >>
>     >>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
>     >>     >>     <vi...@amazon.com.invalid> wrote:
>     >>     >>     >
>     >>     >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
>     >>     >>     >
>     >>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>     >>     >>     >
>     >>     >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>     >>     >>     >
>     >>     >>     >
>     >>     >>     >
>     >>     >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
>     >>     >>     >
>     >>     >>
>     >>
>

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Mehta, Shubham" <sh...@amazon.com.INVALID>.
Thanks, Kaxil – that helped to clarify the proposal a bit more.

> Replacing Access Control provided by FAB with a base/core security model (that is still resource-based)

Are you suggesting that we build this resource-driven security model directly into Airflow, without relying on external dependencies like FAB?

> Extend this to the other Airflow components (scheduler, workers, triggered, cli)

Are there cases where the scheduler or CLI would require the authorization API? Since they are considered trusted components, I assumed they would not need it.


Jarek - as always, I appreciate you sharing your thoughts and having an open discussion.

> Which really explains what "Airflow as a Platform" is all about. I do not think we already know all the parts that should be converted into "Airflow extendability". It's more of an incremental effort like that where we have those bright ideas "Hey - this part can be removed and delegated to others".  I think this has never been formulated explicitly but I think for quite a while we are really in the mode where we think much more about what we can SPLIT OUT from Airflow rather than what we can ADD to Airflow.

Understood. I like the idea of extensibility and "Airflow as a platform." However, we should make sure that we do not worsen the user experience with the extensibility. The "User Management Provider" is something that could potentially make the user experience worse, especially for customers who are self-hosting Airflow. Managed services will ensure that they dedicate resources to maintaining their user management providers. Multi-tenancy will end up becoming a feature for managed service customers, leaving the 74% of Airflow users [1] with a less powerful Airflow. As an example, Timetables is a very powerful feature, which, anecdotally, no customer ends up using due to its complexity.

I am still unclear about other user scenarios related to user management, besides multi-tenancy, that Airflow customers are looking to enable. While the extensibility we aim for will enable this, is there a need for it? Also, @Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested in building a custom user management provider that works with your platform? Have there been cases where your customers were limited by the current permissioning model, and you considered replacing FAB? 

I believe that the primary motivation for "user management provider" is driven by the excitement around getting rid of FAB, which I think we can still achieve while including multi-tenancy in the core Airflow. Both should be treated as separate problems.

References:
1. https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice

On 2023-02-14, 12:44 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Comment to Subham's question:

    > In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?

    I am glad you asked. I think, this is one of the  what I wanted to
    achieve by adding this page
    https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
    - it will be live in 2.6 and one of the main parts is this one:

    https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities

    Which really explains what "Airflow as a Platform" is all about. I do
    not think we already know all the parts that should be converted into
    "Airflow extendability". It's more of an incremental effort like that
    where we have those bright ideas "Hey - this part can be removed and
    delegated to others".  I think this has never been formulated
    explicitly but I think for quite a while we are really in the mode
    where we think much more about what we can SPLIT OUT from Airflow
    rather than what we can ADD to Airflow.

    When you look at it, this is also the main idea behind Open Lineage
    integration for example - we are adding open linage (which is really
    just an API) so that others can build "everything-lineage" on top of
    it. So we are adding a minimum-possible set of APIs and integration so
    that we can expose the lineage capability so that all the lineage "UI"
    and other use cases that lineage exposes would be done outside. We are
    in a strong position to do it - being sure that when we expose it,
    others will implement the integration they care about.

    I think more and more (and It has been preached by Ash mostly, but
    also others) that we should be focusing solely on being an extremely
    powerful and robust scheduler and make sure we are exposing all of the
    possible things that can be exposed as an external API (while still
    providing basic implementation that makes airflow still a "finished"
    product that can be used to handle basic cases.

    BTW. We are now preparing for the Airflow Summit CFP (some
    announcements will follow shortly, I do not want to spill too many
    beans) and we have a very interesting broad category "Airflow and
    ...." . And I think we should work in the direction that the `...` is
    far bigger than Airflow itself.

    J.

    On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <ka...@gmail.com> wrote:
    >
    > Great idea Vikram, I love the idea of making this a provider/pluggable.
    >
    > In some ways, we already have a pluggable mechanism for Authentication with Auth Backends [1]. Where we will need lot more work I think is:
    >
    > Replacing Access Control provided by FAB with a base/core security model (that is still resource-based) [2]
    > Extend this to the other Airflow components (scheduler, workers, triggered, cli) or make them all driven by a single API that takes care of Auth. This will also reduce a lot of duplication of code across many of the components
    > For backwards compact, we could ship with FAB-provider that still uses Flask-app builder in addition to our recommended provider that will have more features and users/companies/stabkeholders can build on top of that provider to extend it further.
    >
    >
    > References:
    > [1]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
    > [2]: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
    >
    > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <sh...@amazon.com.invalid> wrote:
    >>
    >> Hi Vikram,
    >> Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.
    >>
    >> In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
    >> 1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?
    >> 2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
    >> 3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
    >> 4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
    >> 5. What will the migration process look like for existing users to this non-FAB pluggable model?
    >>
    >> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
    >>
    >> Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.
    >>
    >> Regards
    >> Shubham
    >>
    >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>
    >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>
    >>
    >>
    >>     Hey Vikram,
    >>
    >>     I think it's brilliant and I wonder how it happened that had not
    >>     occurred to us earlier. And I believe that is due to the natural
    >>     tendency of "following as we always did" rather than thinking
    >>     completely out-of-the-box. Thanks Vikram for bringing it up.
    >>
    >>     The funny thing is that when I see this:
    >>
    >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
    >>
    >>     I almost immediately think - NOOOOO, why, it's always been here, how
    >>     can we remove it?
    >>
    >>     But then if you look a bit closer:
    >>
    >>     > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.
    >>
    >>     Then it starts to make way more sense. Way more.
    >>
    >>     And when you look further:
    >>
    >>     >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
    >>
    >>     My heart jumps and I am immediately sold on the idea.
    >>
    >>     When I was commenting on the doc  initially, something was not right.
    >>     I had a feeling It is probably the 5th time I am looking and
    >>     commenting on a similar document. And, well, I did, actually. Most of
    >>     the things we discussed there are already implemented out there. We
    >>     just need to make sure we expose enough of the API to use them. For
    >>     example we have Keycloak that is an open source implementation of
    >>     Identity and Access Management. With everything out there already
    >>     integrated. and I've been part of the project that integrated just the
    >>     authentication part. Now if we rethink the authorization and make it
    >>     simpler and "externally driven", this will not only be faster IMHO,
    >>     but also will allow enterprise users to integrate much better.
    >>
    >>     I believe following the path that Vikram outlined will be a good
    >>     direction for everyone in the community - including all the Manage
    >>     Service providers, who will have a far easier job on integrating
    >>     Airflow into their authentication models.
    >>
    >>     J.
    >>
    >>
    >>
    >>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
    >>     <vi...@astronomer.io.invalid> wrote:
    >>     >
    >>     > Shubham and Vincent,
    >>     >
    >>     > Let me start by saying that I apologize for my delayed response to your original email.
    >>     >
    >>     > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
    >>     >
    >>     > However, I don't agree that this level of user management belongs in "Core Airflow".
    >>     >
    >>     > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
    >>     >
    >>     > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
    >>     >
    >>     > Best regards,
    >>     > Vikram
    >>     >
    >>     >
    >>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
    >>     >>
    >>     >> Thanks __
    >>     >>
    >>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>     >>
    >>     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>     >>
    >>     >>
    >>     >>
    >>     >>     Added.
    >>     >>
    >>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
    >>     >>     <vi...@amazon.com.invalid> wrote:
    >>     >>     >
    >>     >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
    >>     >>     >
    >>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>     >>     >
    >>     >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>     >>     >
    >>     >>     >
    >>     >>     >
    >>     >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
    >>     >>     >
    >>     >>
    >>


Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Jarek Potiuk <ja...@potiuk.com>.
Comment to Subham's question:

> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?

I am glad you asked. I think, this is one of the  what I wanted to
achieve by adding this page
https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
- it will be live in 2.6 and one of the main parts is this one:

https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities

Which really explains what "Airflow as a Platform" is all about. I do
not think we already know all the parts that should be converted into
"Airflow extendability". It's more of an incremental effort like that
where we have those bright ideas "Hey - this part can be removed and
delegated to others".  I think this has never been formulated
explicitly but I think for quite a while we are really in the mode
where we think much more about what we can SPLIT OUT from Airflow
rather than what we can ADD to Airflow.

When you look at it, this is also the main idea behind Open Lineage
integration for example - we are adding open linage (which is really
just an API) so that others can build "everything-lineage" on top of
it. So we are adding a minimum-possible set of APIs and integration so
that we can expose the lineage capability so that all the lineage "UI"
and other use cases that lineage exposes would be done outside. We are
in a strong position to do it - being sure that when we expose it,
others will implement the integration they care about.

I think more and more (and It has been preached by Ash mostly, but
also others) that we should be focusing solely on being an extremely
powerful and robust scheduler and make sure we are exposing all of the
possible things that can be exposed as an external API (while still
providing basic implementation that makes airflow still a "finished"
product that can be used to handle basic cases.

BTW. We are now preparing for the Airflow Summit CFP (some
announcements will follow shortly, I do not want to spill too many
beans) and we have a very interesting broad category "Airflow and
...." . And I think we should work in the direction that the `...` is
far bigger than Airflow itself.

J.

On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> Great idea Vikram, I love the idea of making this a provider/pluggable.
>
> In some ways, we already have a pluggable mechanism for Authentication with Auth Backends [1]. Where we will need lot more work I think is:
>
> Replacing Access Control provided by FAB with a base/core security model (that is still resource-based) [2]
> Extend this to the other Airflow components (scheduler, workers, triggered, cli) or make them all driven by a single API that takes care of Auth. This will also reduce a lot of duplication of code across many of the components
> For backwards compact, we could ship with FAB-provider that still uses Flask-app builder in addition to our recommended provider that will have more features and users/companies/stabkeholders can build on top of that provider to extend it further.
>
>
> References:
> [1]: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
> [2]: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
>
> On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <sh...@amazon.com.invalid> wrote:
>>
>> Hi Vikram,
>> Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.
>>
>> In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
>> 1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?
>> 2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
>> 3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
>> 4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
>> 5. What will the migration process look like for existing users to this non-FAB pluggable model?
>>
>> In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?
>>
>> Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.
>>
>> Regards
>> Shubham
>>
>> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>
>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>>     Hey Vikram,
>>
>>     I think it's brilliant and I wonder how it happened that had not
>>     occurred to us earlier. And I believe that is due to the natural
>>     tendency of "following as we always did" rather than thinking
>>     completely out-of-the-box. Thanks Vikram for bringing it up.
>>
>>     The funny thing is that when I see this:
>>
>>     > However, I don't agree that this level of user management belongs in "Core Airflow".
>>
>>     I almost immediately think - NOOOOO, why, it's always been here, how
>>     can we remove it?
>>
>>     But then if you look a bit closer:
>>
>>     > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.
>>
>>     Then it starts to make way more sense. Way more.
>>
>>     And when you look further:
>>
>>     >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
>>
>>     My heart jumps and I am immediately sold on the idea.
>>
>>     When I was commenting on the doc  initially, something was not right.
>>     I had a feeling It is probably the 5th time I am looking and
>>     commenting on a similar document. And, well, I did, actually. Most of
>>     the things we discussed there are already implemented out there. We
>>     just need to make sure we expose enough of the API to use them. For
>>     example we have Keycloak that is an open source implementation of
>>     Identity and Access Management. With everything out there already
>>     integrated. and I've been part of the project that integrated just the
>>     authentication part. Now if we rethink the authorization and make it
>>     simpler and "externally driven", this will not only be faster IMHO,
>>     but also will allow enterprise users to integrate much better.
>>
>>     I believe following the path that Vikram outlined will be a good
>>     direction for everyone in the community - including all the Manage
>>     Service providers, who will have a far easier job on integrating
>>     Airflow into their authentication models.
>>
>>     J.
>>
>>
>>
>>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
>>     <vi...@astronomer.io.invalid> wrote:
>>     >
>>     > Shubham and Vincent,
>>     >
>>     > Let me start by saying that I apologize for my delayed response to your original email.
>>     >
>>     > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
>>     >
>>     > However, I don't agree that this level of user management belongs in "Core Airflow".
>>     >
>>     > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
>>     >
>>     > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
>>     >
>>     > Best regards,
>>     > Vikram
>>     >
>>     >
>>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
>>     >>
>>     >> Thanks __
>>     >>
>>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>     >>
>>     >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>     >>
>>     >>
>>     >>
>>     >>     Added.
>>     >>
>>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
>>     >>     <vi...@amazon.com.invalid> wrote:
>>     >>     >
>>     >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
>>     >>     >
>>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>     >>     >
>>     >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>     >>     >
>>     >>     >
>>     >>     >
>>     >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
>>     >>     >
>>     >>
>>

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Kaxil Naik <ka...@gmail.com>.
Great idea Vikram, I love the idea of making this a provider/pluggable.

In some ways, we already have a pluggable mechanism for Authentication with
Auth Backends *[1]*. Where we will need lot more work I think is:

   1. Replacing Access Control provided by FAB with a base/core security
   model (that is still resource-based) *[2]*
   2. Extend this to the other Airflow components (scheduler, workers,
   triggered, cli) or make them all driven by a single API that takes care of
   Auth. This will also reduce a lot of duplication of code across many of the
   components
   3. For backwards compact, we could ship with FAB-provider that still
   uses Flask-app builder in addition to our recommended provider that will
   have more features and users/companies/stabkeholders can build on top of
   that provider to extend it further.


References:
[1]:
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends

[2]:
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html

On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <sh...@amazon.com.invalid>
wrote:

> Hi Vikram,
> Thank you for taking the time to review the proposal. I appreciate your
> insights — I will make sure to reach out to you directly in the future for
> feedback as that would've undoubtedly saved us some time and effort.
>
> In regards to the separation of user management, I understand your
> concerns and, on a high-level, I agree with you. However, I think it would
> be beneficial to have more details on how it will work. Here are a few
> questions that come to mind:
> 1. How will the user-id/group-id interface interact with Airflow
> resource-level permissions? What parts of "John can-edit dag1 and can-view
> dag2" be part of Airflow core? What will be exposed to the external
> system?
> 2. Who will be responsible for managing the resource-level permissions?
> Will it be the external system?
> 3. What are the limitations of this new pluggable model compared to FAB?
> Will there be restrictions on the granularity of resource access that
> Airflow admins can provide to their users?
> 4. As Jarek pointed out, with this change we want to make authorization
> externally driven. Will this have a significant impact on Airflow
> performance as authorization will be required for fetching variables,
> executing tasks, etc.?
> 5. What will the migration process look like for existing users to this
> non-FAB pluggable model?
>
> In addition, are there any other user scenarios, beyond multi-tenancy,
> that Airflow users are looking to enable and that require this
> pluggability? Asking as I haven't come across them. Overall, I believe we
> need more information on your proposal before seeking feedback from the
> community. Could we work together during February to develop a concrete
> proposal?
>
> Beside this, I would like to propose that we define the scope and
> long-term vision of "Airflow core". To achieve this, it may be helpful to
> first outline the perspectives of the Airflow PMCs. Recently, there have
> been discussions regarding the separation of executors into a separate
> package, the implementation of pluggable schedulers, and other related
> topics. Currently, these decisions and discussions are somewhat ad hoc and
> are made through the mailing list. I would be happy to collaborate and
> invest time in this effort.
>
> Regards
> Shubham
>
> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Hey Vikram,
>
>     I think it's brilliant and I wonder how it happened that had not
>     occurred to us earlier. And I believe that is due to the natural
>     tendency of "following as we always did" rather than thinking
>     completely out-of-the-box. Thanks Vikram for bringing it up.
>
>     The funny thing is that when I see this:
>
>     > However, I don't agree that this level of user management belongs in
> "Core Airflow".
>
>     I almost immediately think - NOOOOO, why, it's always been here, how
>     can we remove it?
>
>     But then if you look a bit closer:
>
>     > think this is a time to consider the concept of a "user management
> provider" with a simple built-in implementation being the current Airflow
> functionality, enabling alternate more complex (but separate)
> implementations such as your proposal here as alternate user management
> providers.
>
>     Then it starts to make way more sense. Way more.
>
>     And when you look further:
>
>     >  Maybe, this also enables us to get rid of the Fab security manager
> from core Airflow?
>
>     My heart jumps and I am immediately sold on the idea.
>
>     When I was commenting on the doc  initially, something was not right.
>     I had a feeling It is probably the 5th time I am looking and
>     commenting on a similar document. And, well, I did, actually. Most of
>     the things we discussed there are already implemented out there. We
>     just need to make sure we expose enough of the API to use them. For
>     example we have Keycloak that is an open source implementation of
>     Identity and Access Management. With everything out there already
>     integrated. and I've been part of the project that integrated just the
>     authentication part. Now if we rethink the authorization and make it
>     simpler and "externally driven", this will not only be faster IMHO,
>     but also will allow enterprise users to integrate much better.
>
>     I believe following the path that Vikram outlined will be a good
>     direction for everyone in the community - including all the Manage
>     Service providers, who will have a far easier job on integrating
>     Airflow into their authentication models.
>
>     J.
>
>
>
>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
>     <vi...@astronomer.io.invalid> wrote:
>     >
>     > Shubham and Vincent,
>     >
>     > Let me start by saying that I apologize for my delayed response to
> your original email.
>     >
>     > I appreciate the detailed write-up and the thought behind it. I
> completely agree with your use case and understand how this is applicable
> to enterprises with multiple data teams using Airflow.
>     >
>     > However, I don't agree that this level of user management belongs in
> "Core Airflow".
>     >
>     > I strongly believe that the core Airflow mission is for the
> community at large and for data practitioners either individuals or teams
> within enterprises. And therefore, I don't disagree with the intent of
> making it easier for enterprise teams to adopt Airflow. But, I think there
> is a never ending list of user management features which are needed to
> support Enterprise needs. We have already struggled with this over time and
> faced challenges with the Fab security manager and its integration in
> Airflow.
>     >
>     > I think we should use this opportunity and your use case to
> "separate the user management" from Core Airflow outside of the absolute
> basics. I think this is a time to consider the concept of a "user
> management provider" with a simple built-in implementation being the
> current Airflow functionality, enabling alternate more complex (but
> separate) implementations such as your proposal here as alternate user
> management providers. Maybe, this also enables us to get rid of the Fab
> security manager from core Airflow?
>     >
>     > Best regards,
>     > Vikram
>     >
>     >
>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent
> <vi...@amazon.com.invalid> wrote:
>     >>
>     >> Thanks __
>     >>
>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>     >>
>     >>     CAUTION: This email originated from outside of the
> organization. Do not click links or open attachments unless you can confirm
> the sender and know the content is safe.
>     >>
>     >>
>     >>
>     >>     Added.
>     >>
>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
>     >>     <vi...@amazon.com.invalid> wrote:
>     >>     >
>     >>     > Thank you!
> https://cwiki.apache.org/confluence/display/~vin100.beck
>     >>     >
>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com>
> wrote:
>     >>     >
>     >>     >     CAUTION: This email originated from outside of the
> organization. Do not click links or open attachments unless you can confirm
> the sender and know the content is safe.
>     >>     >
>     >>     >
>     >>     >
>     >>     >     What's your cwiki ID, Vincent (I'll add you without going
> into details yet)
>     >>     >
>     >>
>
>

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Mehta, Shubham" <sh...@amazon.com.INVALID>.
Hi Vikram,
Thank you for taking the time to review the proposal. I appreciate your insights — I will make sure to reach out to you directly in the future for feedback as that would've undoubtedly saved us some time and effort.

In regards to the separation of user management, I understand your concerns and, on a high-level, I agree with you. However, I think it would be beneficial to have more details on how it will work. Here are a few questions that come to mind:
1. How will the user-id/group-id interface interact with Airflow resource-level permissions? What parts of "John can-edit dag1 and can-view dag2" be part of Airflow core? What will be exposed to the external system?  
2. Who will be responsible for managing the resource-level permissions? Will it be the external system?
3. What are the limitations of this new pluggable model compared to FAB? Will there be restrictions on the granularity of resource access that Airflow admins can provide to their users?
4. As Jarek pointed out, with this change we want to make authorization externally driven. Will this have a significant impact on Airflow performance as authorization will be required for fetching variables, executing tasks, etc.?
5. What will the migration process look like for existing users to this non-FAB pluggable model?

In addition, are there any other user scenarios, beyond multi-tenancy, that Airflow users are looking to enable and that require this pluggability? Asking as I haven't come across them. Overall, I believe we need more information on your proposal before seeking feedback from the community. Could we work together during February to develop a concrete proposal?

Beside this, I would like to propose that we define the scope and long-term vision of "Airflow core". To achieve this, it may be helpful to first outline the perspectives of the Airflow PMCs. Recently, there have been discussions regarding the separation of executors into a separate package, the implementation of pluggable schedulers, and other related topics. Currently, these decisions and discussions are somewhat ad hoc and are made through the mailing list. I would be happy to collaborate and invest time in this effort.

Regards
Shubham

On 2023-02-13, 11:04 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Hey Vikram,

    I think it's brilliant and I wonder how it happened that had not
    occurred to us earlier. And I believe that is due to the natural
    tendency of "following as we always did" rather than thinking
    completely out-of-the-box. Thanks Vikram for bringing it up.

    The funny thing is that when I see this:

    > However, I don't agree that this level of user management belongs in "Core Airflow".

    I almost immediately think - NOOOOO, why, it's always been here, how
    can we remove it?

    But then if you look a bit closer:

    > think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.

    Then it starts to make way more sense. Way more.

    And when you look further:

    >  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?

    My heart jumps and I am immediately sold on the idea.

    When I was commenting on the doc  initially, something was not right.
    I had a feeling It is probably the 5th time I am looking and
    commenting on a similar document. And, well, I did, actually. Most of
    the things we discussed there are already implemented out there. We
    just need to make sure we expose enough of the API to use them. For
    example we have Keycloak that is an open source implementation of
    Identity and Access Management. With everything out there already
    integrated. and I've been part of the project that integrated just the
    authentication part. Now if we rethink the authorization and make it
    simpler and "externally driven", this will not only be faster IMHO,
    but also will allow enterprise users to integrate much better.

    I believe following the path that Vikram outlined will be a good
    direction for everyone in the community - including all the Manage
    Service providers, who will have a far easier job on integrating
    Airflow into their authentication models.

    J.



    On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
    <vi...@astronomer.io.invalid> wrote:
    >
    > Shubham and Vincent,
    >
    > Let me start by saying that I apologize for my delayed response to your original email.
    >
    > I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
    >
    > However, I don't agree that this level of user management belongs in "Core Airflow".
    >
    > I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
    >
    > I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
    >
    > Best regards,
    > Vikram
    >
    >
    > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
    >>
    >> Thanks __
    >>
    >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>
    >>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>
    >>
    >>
    >>     Added.
    >>
    >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
    >>     <vi...@amazon.com.invalid> wrote:
    >>     >
    >>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
    >>     >
    >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>     >
    >>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>     >
    >>     >
    >>     >
    >>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
    >>     >
    >>


Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hey Vikram,

I think it's brilliant and I wonder how it happened that had not
occurred to us earlier. And I believe that is due to the natural
tendency of "following as we always did" rather than thinking
completely out-of-the-box. Thanks Vikram for bringing it up.

The funny thing is that when I see this:

> However, I don't agree that this level of user management belongs in "Core Airflow".

I almost immediately think - NOOOOO, why, it's always been here, how
can we remove it?

But then if you look a bit closer:

> think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers.

Then it starts to make way more sense. Way more.

And when you look further:

>  Maybe, this also enables us to get rid of the Fab security manager from core Airflow?

My heart jumps and I am immediately sold on the idea.

When I was commenting on the doc  initially, something was not right.
I had a feeling It is probably the 5th time I am looking and
commenting on a similar document. And, well, I did, actually. Most of
the things we discussed there are already implemented out there. We
just need to make sure we expose enough of the API to use them. For
example we have Keycloak that is an open source implementation of
Identity and Access Management. With everything out there already
integrated. and I've been part of the project that integrated just the
authentication part. Now if we rethink the authorization and make it
simpler and "externally driven", this will not only be faster IMHO,
but also will allow enterprise users to integrate much better.

I believe following the path that Vikram outlined will be a good
direction for everyone in the community - including all the Manage
Service providers, who will have a far easier job on integrating
Airflow into their authentication models.

J.



On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
<vi...@astronomer.io.invalid> wrote:
>
> Shubham and Vincent,
>
> Let me start by saying that I apologize for my delayed response to your original email.
>
> I appreciate the detailed write-up and the thought behind it. I completely agree with your use case and understand how this is applicable to enterprises with multiple data teams using Airflow.
>
> However, I don't agree that this level of user management belongs in "Core Airflow".
>
> I strongly believe that the core Airflow mission is for the community at large and for data practitioners either individuals or teams within enterprises. And therefore, I don't disagree with the intent of making it easier for enterprise teams to adopt Airflow. But, I think there is a never ending list of user management features which are needed to support Enterprise needs. We have already struggled with this over time and faced challenges with the Fab security manager and its integration in Airflow.
>
> I think we should use this opportunity and your use case to "separate the user management" from Core Airflow outside of the absolute basics. I think this is a time to consider the concept of a "user management provider" with a simple built-in implementation being the current Airflow functionality, enabling alternate more complex (but separate) implementations such as your proposal here as alternate user management providers. Maybe, this also enables us to get rid of the Fab security manager from core Airflow?
>
> Best regards,
> Vikram
>
>
> On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid> wrote:
>>
>> Thanks __
>>
>> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>
>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>>     Added.
>>
>>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
>>     <vi...@amazon.com.invalid> wrote:
>>     >
>>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
>>     >
>>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>     >
>>     >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>     >
>>     >
>>     >
>>     >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
>>     >
>>

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Vikram Koka <vi...@astronomer.io.INVALID>.
Shubham and Vincent,

Let me start by saying that I apologize for my delayed response to your
original email.

I appreciate the detailed write-up and the thought behind it. I completely
agree with your use case and understand how this is applicable to
enterprises with multiple data teams using Airflow.

However, I don't agree that this level of user management belongs in "Core
Airflow".

I strongly believe that the core Airflow mission is for the community at
large and for data practitioners either individuals or teams within
enterprises. And therefore, I don't disagree with the intent of making it
easier for enterprise teams to adopt Airflow. But, I think there is a never
ending list of user management features which are needed to support
Enterprise needs. We have already struggled with this over time and faced
challenges with the Fab security manager and its integration in Airflow.

I think we should use this opportunity and your use case to "separate the
user management" from Core Airflow outside of the absolute basics. I think
this is a time to consider the concept of a "user management provider" with
a simple built-in implementation being the current Airflow functionality,
enabling alternate more complex (but separate) implementations such as your
proposal here as alternate user management providers. Maybe, this also
enables us to get rid of the Fab security manager from core Airflow?

Best regards,
Vikram


On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent <vi...@amazon.com.invalid>
wrote:

> Thanks __
>
> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Added.
>
>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
>     <vi...@amazon.com.invalid> wrote:
>     >
>     > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
>     >
>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>     >
>     >     CAUTION: This email originated from outside of the organization.
> Do not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>     >
>     >
>     >
>     >     What's your cwiki ID, Vincent (I'll add you without going into
> details yet)
>     >
>
>

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Beck, Vincent" <vi...@amazon.com.INVALID>.
Thanks __

On 2023-02-03, 10:55 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Added.

    On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
    <vi...@amazon.com.invalid> wrote:
    >
    > Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
    >
    > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >
    >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >
    >
    >
    >     What's your cwiki ID, Vincent (I'll add you without going into details yet)
    >


Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Jarek Potiuk <ja...@potiuk.com>.
Added.

On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
<vi...@amazon.com.invalid> wrote:
>
> Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck
>
> On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>     What's your cwiki ID, Vincent (I'll add you without going into details yet)
>

Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by "Beck, Vincent" <vi...@amazon.com.INVALID>.
Thank you! https://cwiki.apache.org/confluence/display/~vin100.beck

On 2023-02-02, 5:38 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    What's your cwiki ID, Vincent (I'll add you without going into details yet)


Re: Seeking Feedback for Airflow Multi-Tenant Model Proposal

Posted by Jarek Potiuk <ja...@potiuk.com>.
What's your cwiki ID, Vincent (I'll add you without going into details yet)