You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2021/06/13 15:45:22 UTC

[DISCUSS] Managing provider Connections via UI in managed Airflow services

Dear Airflow community,

Here is another result of discussions. I would like to raise an attention
to potential Connection management problems that might affect managed
services for Airflow 2.0 and some providers.

With Airflow 2.0, connection UI "customisations" are baked into the
provider package and in order to see - for example Postgres connection in
the UI, you need to have the "postgres" provider installed in the Webserver.

As far as I know some of the Managed Airflow services (MWAA, Composer,
possibly other) do not currently allow their users installation of
additional packages in the webserver (the webserver container is different
than the scheduler/worker). This makes it impossible to configure/edit
provider connections via UI (unless those providers are pre-installed in
the webserver image).

While this is understandable from security point of view to forbid "any''
package installation, I think the official
"apache-airlfow-providers-*" should be allowlisted for those images and
installed or otherwise made available (for example via pre-installing all
providers in the webserver image if this is not possible from security
point of view to rebuild the image dynamically)

I wonder what people (and especially the people from MWAA, Composer team)
think about it - do I get it right about the security concerns? Any other
comments?


J.

-- 
+48 660 796 129

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
Side comment~ I am super glad we have now also Composer and MWAA
people commenting and answering on the devlist. I hope this is just a
beginning and you will get more involved over time :D

@Daniel:
> Is at all feasible to deprecate connection UI customization?  Then everything can just use `extra` json where the other params fall short.  Seems like an area where the benefit does not outweigh the complexity.  We could also take the opportunity to deprecate the long `extra` key names like `extra__google_cloud_platform__keyfile_dict` in favor of simpler ones e.g. `keyfile_dict`.

I think this is a very useful feature. As Ash mentioned, there is also
the test_connection feature added recently, and I think I took part in
quite a number of discussions that people actually find it useful, We
had a bug in 2.0.0 and there were complaints :)

> Thank you for surfacing this issue on a discussion. The major hurdle for managed services apart from the security constraints is on the licensing side. Previously when the code needed for connection templates was part of Airflow, we were able to bundle them as a solution as the code was under the Apache v2 license. Now that we have them separated out as provider packages, those come with dependencies that do not have "blessed" licenses that allow bundling them into managed service. I am sure GCP folks have similar restrictions on why they cannot add all 60+ providers as is into the base image.
> We recently did the manual exercise to assess each of those provider package and their dependencies, and only 20 of them made the cut for not having to use additional licenses like Facebook license, LGPL etc.

Valid point. Perfectly understandable. I can sympathise with that - we
had very similar to the discussions we had in ASF about releasing
source packages). While for Airflow those provider packages are
optional so according to ASF policies we can do, it, I understand for
managed services it might be a bit different.

> > >As a temporary workaround we baked all connections (list of them with their
> > >widgets pickled and stored inside) into a web server image, so that
> > >customers can add/edit them (even though not all providers packages are
> > >pre-installed). This is a temporary workaround that we came up with for now
> > >and we are looking for a long-term solution.

Yeah, This is my thought exactly on how to fix it but we can do it in
a manageable way. I thought that we could simply provide a
tool/script/airflow command (might be community managed) that could
produce provider's "shims" to handle the case. Provider manager
already extracts all the relevant information during CI for all the
providers released, and it could easily bundle this information into a
separate set of packages that will be a super-slimmed down version of
providers with JUST UI customization. You would then be able to build
+ install such "shim" packages automatically instead of the main
provider whenever new providers are released. They will contain just
ASF-licensed code (So likely you should be ok Subash) and no
dependencies (Eugene) and then you could choose which "real" and which
"shim" packages you would like to install on the webserver.

Would that work?

Maybe that's a nice contribution to the community. Happy to help on that one :)

J.

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Canapathy, Subash" <su...@amazon.com.INVALID>.
If the connection templates were available as a shim, for sure we will bake that into the webserver container. For now, we are choosing to bake any provider with a preferred license to alleviate pain for most customers – the current list that we know is ok to go ahead (in addition to the amazon provider package) is the below ones.
apache-airflow-providers-databricks
apache-airflow-providers-ssh
apache-airflow-providers-postgres
apache-airflow-providers-docker
apache-airflow-providers-facebook
apache-airflow-providers-oracle
apache-airflow-providers-presto
apache-airflow-providers-salesforce
apache-airflow-providers-sftp
apache-airflow-providers-tableau

I will start a separate discussion on the private@ list. I might not have subscription on that. I should run that by AWS OSS group before I send it out.

Thanks
Subash

From: Jarek Potiuk <ja...@potiuk.com>
Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Date: Tuesday, June 15, 2021 at 11:45 AM
To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



On Tue, Jun 15, 2021 at 8:22 PM Canapathy, Subash <su...@amazon.com.invalid> wrote:
@Ash Berlin-Taylor<ma...@apache.org> – I don’t think that is entirely true. In 1.10 the connection templates code was part of the flask application and not bundled with the provider. Managed services took the webserver baseline as is and let the customers take decision on additives like FB-business, oracle etc.. without bundling them into the managed service software per AWS compliance guidelines. In 2.0 if we bake in all the providers, it will mean that we are baking in their dependencies along with.

Just to clarify - in 1.10 there were no providers (there were backports for them) but the dependencies you are talking about were coming through extras (and I think you had very limited set of extras which indeed made you not depend on the dependencies). So you are right in 1.10 you did not have this coupling (all the connection types were baked-in airflow as you explained).

 Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime. There is 2 levels of isolation – one on the single tenancy of environments in MWAA under separate VPCs, and secondly on Fargate that prevents exploits to break out of the container boundaries into the hypervisor. Even with those, our security team had other possibilities of exploits unearthed in penetration testing that led to this decision.

Yep. I was involved in a vetting process of similar UI-facing apps and I understand where it comes from, however we would love to hear what your concerns are Subash. Maybe you can send a message to private@airflow.apache.org<ma...@airflow.apache.org> and explain ?

Coming back to my question. Would that help if you could generate such a "shim" package and you install it at the webserver?

I can imagine a tool/command where you specify which built-in providers you installed, and run a command that could generate (out of Airflow sources corresponding to your version) a "whl" package containing:

* entrypoint returning set of connections that you miss
* extracted classes containing the meta-data and Field definitions for the UI for the relevant connections
* no additional dependencies

There is a small potential caveat with versioning for future versions of providers (but those connections change rarely and mostly in backwards compatible ways and likely you could regenerate the "shim" periodically).

J.

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
On Tue, Jun 15, 2021 at 8:22 PM Canapathy, Subash <su...@amazon.com.invalid>
wrote:

> @Ash Berlin-Taylor <as...@apache.org> – I don’t think that is entirely
> true. In 1.10 the connection templates code was part of the flask
> application and not bundled with the provider. Managed services took the
> webserver baseline as is and let the customers take decision on additives
> like FB-business, oracle etc.. without bundling them into the managed
> service software per AWS compliance guidelines. In 2.0 if we bake in all
> the providers, it will mean that we are baking in their dependencies along
> with.
>
>
Just to clarify - in 1.10 there were no providers (there were backports for
them) but the dependencies you are talking about were coming through extras
(and I think you had very limited set of extras which indeed made you not
depend on the dependencies). So you are right in 1.10 you did not have this
coupling (all the connection types were baked-in airflow as you explained).


>  Regarding security constraints on why we disallow plugins and
> requirements on the webserver, I will have to discuss this in person on PMC
> but on a high level this comes down to remote code execution prevention on
> managed instances, opening possibilities of exploiting vulnerabilities on
> the flask-app-builder and the underlying python runtime. There is 2 levels
> of isolation – one on the single tenancy of environments in MWAA under
> separate VPCs, and secondly on Fargate that prevents exploits to break out
> of the container boundaries into the hypervisor. Even with those, our
> security team had other possibilities of exploits unearthed in penetration
> testing that led to this decision.
>

Yep. I was involved in a vetting process of similar UI-facing apps and I
understand where it comes from, however we would love to hear what your
concerns are Subash. Maybe you can send a message to
private@airflow.apache.org and explain ?

Coming back to my question. Would that help if you could generate such a
"shim" package and you install it at the webserver?

I can imagine a tool/command where you specify which built-in providers you
installed, and run a command that could generate (out of Airflow sources
corresponding to your version) a "whl" package containing:

* entrypoint returning set of connections that you miss
* extracted classes containing the meta-data and Field definitions for the
UI for the relevant connections
* no additional dependencies

There is a small potential caveat with versioning for future versions of
providers (but those connections change rarely and mostly in backwards
compatible ways and likely you could regenerate the "shim" periodically).

J.

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Tomasz Urbaszek <tu...@gmail.com>.
> However I still don't really understand the risk - if dag authors can
write dags they can run a python operator to do what ever they like. How
are plugins different?
> The main thing I can't wrap my head around why suitably permissioned
users can't have the ability to customise the webserver image/container.

I second Ash here, I'm still not sure if I understand what is the risk
here. As it was said - anyone who can submit a DAG can run arbitrary code
or drop the Airflow database.

Tomek

On Sat, 19 Jun 2021 at 10:04, Ash Berlin-Taylor <as...@apache.org> wrote:

> > Plugins, providers, and their associated Python libraries all need to
> execute code in order to be installed which is a vulnerability.
>
> Please rephrase this - I understand what you mean, but this is too broad a
> statement. It is at worst a _potential_ vulnerability.
>
> However I still don't really understand the risk - if dag authors can
> write dags they can run a python operator to do what ever they like. How
> are plugins different?
>
> The main thing I can't wrap my head around why suitably permissioned users
> can't have the ability to customise the webserver image/container.
>
> -ash
>
>
> On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID>
> wrote:
>>
>> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>>
>> I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
>>
>> On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>
>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>>>
>>
>>     But in Airflow 2.0 the code provided by "DAG writers" is not executed
>>     any more.  This is entirely gone together with Airflow 1.10.  This has
>>     been handled by DAG serialization, which is the only option available
>>     in 2.0. I do not see how the "Users" could add any code if "Admins"
>>     control the packages that are installed in the webserver. Now if
>>     Admin/User is the only problem then I think this is really
>>     misunderstanding coming from the pre-DAG-serialization world of Apache
>>     Airflow.
>>
>>     J.
>>
>>

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.

Please rephrase this - I understand what you mean, but this is too broad a statement. It is at worst a _potential_ vulnerability.

However I still don't really understand the risk - if dag authors can write dags they can run a python operator to do what ever they like. How are plugins different?

The main thing I can't wrap my head around why suitably permissioned users can't have the ability to customise the webserver image/container. 

-ash


On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID> wrote:
>Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>
>I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
>
>On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>    > That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>
>    But in Airflow 2.0 the code provided by "DAG writers" is not executed
>    any more.  This is entirely gone together with Airflow 1.10.  This has
>    been handled by DAG serialization, which is the only option available
>    in 2.0. I do not see how the "Users" could add any code if "Admins"
>    control the packages that are installed in the webserver. Now if
>    Admin/User is the only problem then I think this is really
>    misunderstanding coming from the pre-DAG-serialization world of Apache
>    Airflow.
>
>    J.
>

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Canapathy, Subash" <su...@amazon.com.INVALID>.
We have asked our security team to re-assess this and provide any guidance in case they feel that there are exploits. If they do share those, I will engage a conversation on the private email list.

On 6/28/21, 3:02 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Hey everyone,

    Coming back to the discussion - Josh, Subash, do you have anything
    more to share regarding the security (again - might be on private@).

    We've been recently chosen by the ASF to write the blog post about the
    security practices
    https://blogs.apache.org/foundation/entry/success-at-apache-security-in
    and we have the "Airflow security" discussion panel at the Airlfow
    Summit - https://airflowsummit.org/sessions/2021/panel-airflow-security
    so I think it would be great to know before that if there is anything
    we should do, or at least plan about it.

    We also started to work more closely with Dolev, a security researcher
    and expert who will take in the panel and who raised a few security
    issues for Airflow and he is looking more closely at
    all-things-airflow, so maybe a good time to take a closer look at the
    plugins/providers/webserver case.

    J.

    On Sat, Jun 19, 2021 at 1:47 PM Kaxil Naik <ka...@gmail.com> wrote:
    >
    > Said differently, Plugins should be reviewed by the team similar to how the admin team would review DAGs.
    >
    > But definitely happy to hear about all specific security concerns in as much detail as possible (on private@ though to avoid making those details public risking some less-secure envs).
    >
    > Regards,
    > Kaxil
    >
    > On Sat, Jun 19, 2021, 11:56 Kaxil Naik <ka...@gmail.com> wrote:
    >>
    >> Regarding manipulating/compromising auth via plugins, auth backend is set in airflow.cfg /env vars and is used when the Webserver is started, you can't manipulate it.
    >>
    >> If there is any security vector you have discovered I would suggest you to email private@airflow.apache.org with replication details.
    >>
    >> And I agree with what has been already discussed about Dagfile Code and Plugins. What specific risk are we talking about?
    >>
    >> Can you go one level deeper in the details please and use private@ if you have reproduction steps.
    >>
    >> Regards,
    >> Kaxil
    >>
    >>
    >>
    >> On Sat, Jun 19, 2021, 09:21 Ash Berlin-Taylor <as...@apache.org> wrote:
    >>>
    >>> > I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins? Seems that would be more consistent.
    >>>
    >>> There are more to providers than just defining connection types.
    >>>
    >>> For example you can define custom operator links in providers http://airflow.apache.org/docs/apache-airflow/stable/howto/define_extra_link.html and to avoid potential deserialisation attacks the links must be registered with the webserver code before it will create an instance of the class
    >>>
    >>> (And links are classes rather than static so that you can do things like generate temporary/presigned S3 URLs)
    >>>
    >>> -Ash
    >>>
    >>> On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID> wrote:
    >>>>
    >>>> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
    >>>>
    >>>> I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
    >>>>
    >>>> On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >>>>
    >>>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >>>>
    >>>>
    >>>>
    >>>>> That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
    >>>>
    >>>>
    >>>>     But in Airflow 2.0 the code provided by "DAG writers" is not executed
    >>>>     any more.  This is entirely gone together with Airflow 1.10.  This has
    >>>>     been handled by DAG serialization, which is the only option available
    >>>>     in 2.0. I do not see how the "Users" could add any code if "Admins"
    >>>>     control the packages that are installed in the webserver. Now if
    >>>>     Admin/User is the only problem then I think this is really
    >>>>     misunderstanding coming from the pre-DAG-serialization world of Apache
    >>>>     Airflow.
    >>>>
    >>>>     J.
    >>>>


    --
    +48 660 796 129


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hey everyone,

Coming back to the discussion - Josh, Subash, do you have anything
more to share regarding the security (again - might be on private@).

We've been recently chosen by the ASF to write the blog post about the
security practices
https://blogs.apache.org/foundation/entry/success-at-apache-security-in
and we have the "Airflow security" discussion panel at the Airlfow
Summit - https://airflowsummit.org/sessions/2021/panel-airflow-security
so I think it would be great to know before that if there is anything
we should do, or at least plan about it.

We also started to work more closely with Dolev, a security researcher
and expert who will take in the panel and who raised a few security
issues for Airflow and he is looking more closely at
all-things-airflow, so maybe a good time to take a closer look at the
plugins/providers/webserver case.

J.

On Sat, Jun 19, 2021 at 1:47 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> Said differently, Plugins should be reviewed by the team similar to how the admin team would review DAGs.
>
> But definitely happy to hear about all specific security concerns in as much detail as possible (on private@ though to avoid making those details public risking some less-secure envs).
>
> Regards,
> Kaxil
>
> On Sat, Jun 19, 2021, 11:56 Kaxil Naik <ka...@gmail.com> wrote:
>>
>> Regarding manipulating/compromising auth via plugins, auth backend is set in airflow.cfg /env vars and is used when the Webserver is started, you can't manipulate it.
>>
>> If there is any security vector you have discovered I would suggest you to email private@airflow.apache.org with replication details.
>>
>> And I agree with what has been already discussed about Dagfile Code and Plugins. What specific risk are we talking about?
>>
>> Can you go one level deeper in the details please and use private@ if you have reproduction steps.
>>
>> Regards,
>> Kaxil
>>
>>
>>
>> On Sat, Jun 19, 2021, 09:21 Ash Berlin-Taylor <as...@apache.org> wrote:
>>>
>>> > I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins? Seems that would be more consistent.
>>>
>>> There are more to providers than just defining connection types.
>>>
>>> For example you can define custom operator links in providers http://airflow.apache.org/docs/apache-airflow/stable/howto/define_extra_link.html and to avoid potential deserialisation attacks the links must be registered with the webserver code before it will create an instance of the class
>>>
>>> (And links are classes rather than static so that you can do things like generate temporary/presigned S3 URLs)
>>>
>>> -Ash
>>>
>>> On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID> wrote:
>>>>
>>>> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>>>>
>>>> I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
>>>>
>>>> On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>>>
>>>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>>
>>>>
>>>>
>>>>> That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>>>>
>>>>
>>>>     But in Airflow 2.0 the code provided by "DAG writers" is not executed
>>>>     any more.  This is entirely gone together with Airflow 1.10.  This has
>>>>     been handled by DAG serialization, which is the only option available
>>>>     in 2.0. I do not see how the "Users" could add any code if "Admins"
>>>>     control the packages that are installed in the webserver. Now if
>>>>     Admin/User is the only problem then I think this is really
>>>>     misunderstanding coming from the pre-DAG-serialization world of Apache
>>>>     Airflow.
>>>>
>>>>     J.
>>>>


-- 
+48 660 796 129

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Kaxil Naik <ka...@gmail.com>.
Said differently, Plugins should be reviewed by the team similar to how the
admin team would review DAGs.

But definitely happy to hear about all specific security concerns in as
much detail as possible (on private@ though to avoid making those details
public risking some less-secure envs).

Regards,
Kaxil

On Sat, Jun 19, 2021, 11:56 Kaxil Naik <ka...@gmail.com> wrote:

> Regarding manipulating/compromising auth via plugins, auth backend is set
> in airflow.cfg /env vars and is used when the Webserver is started, you
> can't manipulate it.
>
> If there is any security vector you have discovered I would suggest you to
> email private@airflow.apache.org with replication details.
>
> And I agree with what has been already discussed about Dagfile Code and
> Plugins. What specific risk are we talking about?
>
> Can you go one level deeper in the details please and use private@ if you
> have reproduction steps.
>
> Regards,
> Kaxil
>
>
>
> On Sat, Jun 19, 2021, 09:21 Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> > I would turn your argument the other way around--if you're already in a
>> no-install, serialized model for DAGs why not extend that to all aspects of
>> the webserver such as connections and UI plugins? Seems that would be more
>> consistent.
>>
>> There are more to providers than just defining connection types.
>>
>> For example you can define custom operator links in providers
>> http://airflow.apache.org/docs/apache-airflow/stable/howto/define_extra_link.html
>> and to avoid potential deserialisation attacks the links must be registered
>> with the webserver code before it will create an instance of the class
>>
>> (And links are classes rather than static so that you can do things like
>> generate temporary/presigned S3 URLs)
>>
>> -Ash
>>
>> On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID>
>> wrote:
>>>
>>> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>>>
>>> I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
>>>
>>> On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>>
>>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>>>>
>>>
>>>     But in Airflow 2.0 the code provided by "DAG writers" is not executed
>>>     any more.  This is entirely gone together with Airflow 1.10.  This has
>>>     been handled by DAG serialization, which is the only option available
>>>     in 2.0. I do not see how the "Users" could add any code if "Admins"
>>>     control the packages that are installed in the webserver. Now if
>>>     Admin/User is the only problem then I think this is really
>>>     misunderstanding coming from the pre-DAG-serialization world of Apache
>>>     Airflow.
>>>
>>>     J.
>>>
>>>

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Kaxil Naik <ka...@gmail.com>.
Regarding manipulating/compromising auth via plugins, auth backend is set
in airflow.cfg /env vars and is used when the Webserver is started, you
can't manipulate it.

If there is any security vector you have discovered I would suggest you to
email private@airflow.apache.org with replication details.

And I agree with what has been already discussed about Dagfile Code and
Plugins. What specific risk are we talking about?

Can you go one level deeper in the details please and use private@ if you
have reproduction steps.

Regards,
Kaxil



On Sat, Jun 19, 2021, 09:21 Ash Berlin-Taylor <as...@apache.org> wrote:

> > I would turn your argument the other way around--if you're already in a
> no-install, serialized model for DAGs why not extend that to all aspects of
> the webserver such as connections and UI plugins? Seems that would be more
> consistent.
>
> There are more to providers than just defining connection types.
>
> For example you can define custom operator links in providers
> http://airflow.apache.org/docs/apache-airflow/stable/howto/define_extra_link.html
> and to avoid potential deserialisation attacks the links must be registered
> with the webserver code before it will create an instance of the class
>
> (And links are classes rather than static so that you can do things like
> generate temporary/presigned S3 URLs)
>
> -Ash
>
> On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID>
> wrote:
>>
>> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>>
>> I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
>>
>> On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>>
>>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>>>
>>
>>     But in Airflow 2.0 the code provided by "DAG writers" is not executed
>>     any more.  This is entirely gone together with Airflow 1.10.  This has
>>     been handled by DAG serialization, which is the only option available
>>     in 2.0. I do not see how the "Users" could add any code if "Admins"
>>     control the packages that are installed in the webserver. Now if
>>     Admin/User is the only problem then I think this is really
>>     misunderstanding coming from the pre-DAG-serialization world of Apache
>>     Airflow.
>>
>>     J.
>>
>>

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
> I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.

There are more to providers than just defining connection types.

For example you can define custom operator links in providers http://airflow.apache.org/docs/apache-airflow/stable/howto/define_extra_link.html and to avoid potential deserialisation attacks the links must be registered with the webserver code before it will create an instance of the class 

(And links are classes rather than static so that you can do things like generate temporary/presigned S3 URLs)

-Ash

On 18 June 2021 22:58:29 BST, "Jackson, John" <ja...@amazon.com.INVALID> wrote:
>Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>
>I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.
>
>On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>    > That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>
>    But in Airflow 2.0 the code provided by "DAG writers" is not executed
>    any more.  This is entirely gone together with Airflow 1.10.  This has
>    been handled by DAG serialization, which is the only option available
>    in 2.0. I do not see how the "Users" could add any code if "Admins"
>    control the packages that are installed in the webserver. Now if
>    Admin/User is the only problem then I think this is really
>    misunderstanding coming from the pre-DAG-serialization world of Apache
>    Airflow.
>
>    J.
>

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Jackson, John" <ja...@amazon.com.INVALID>.
Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.

I would turn your argument the other way around--if you're already in a no-install, serialized model for DAGs why not extend that to all aspects of the webserver such as connections and UI plugins?  Seems that would be more consistent.

On 2021-06-18, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    > That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.

    But in Airflow 2.0 the code provided by "DAG writers" is not executed
    any more.  This is entirely gone together with Airflow 1.10.  This has
    been handled by DAG serialization, which is the only option available
    in 2.0. I do not see how the "Users" could add any code if "Admins"
    control the packages that are installed in the webserver. Now if
    Admin/User is the only problem then I think this is really
    misunderstanding coming from the pre-DAG-serialization world of Apache
    Airflow.

    J.


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
Just for clarity - correction to the last paragraph - <if added by the
User, the package is added only to "worker/scheduler">

J.

On Sat, Jun 19, 2021 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> Hey Subash, Jon
>
> > Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.
>
> @John - it's You who introduced the User/Admin separation and
> reasoning. I think you should follow the logical consequence of it and
> introduce different level of access for those two types of users to
> manage the platform to address it. You can control who has access to
> install things and where. You are managing the access control to be
> able to reconfigure the MWAA already and I am sure you do not give
> casual "users" the ability to control certain aspects of the platform.
> I am sure you could restrict the ability to install packages on
> Webserver to only admins and have it open also for users in the
> "scheduler/worker". Is that not possible? It sounds like what you
> really need from your description.
>
> > @Jarek - you are right about the use/admin difference, it’s a disambiguation that permeates beyond the airflow UI layer in MWAA - IAM auth is used for determining authN and AuthZ, hence to secure the webserver from un-authorized code, we would have to either a/ treat plugin updates as an elevated permission activity, or b/ separate out the webserver intended requirements/plugins from the ones required for DAGs so that the authZ can be handled separately.
>
> Correct. This is exactly what I propose. Have a separate
> "providers/plugins' install which only admins can update. Any package
> added by "Admin" is added to both webserver is added to both -
> webserver and worker/shcheduler. If you want dag-only packages that
> are needed by "Users" they can be only added to workers. Sounds pretty
> straightforward.
>
> > We stayed with the one-DAG-bad ideology to not add complexity to customers and coaching them on "if you add to A it goes here, and if B it goes to webserver". That’s is why we are now between rock and a hard place - not being to open up all installs into webserver OR separate the DAG bag for webserver and other entities.
>
> No. This is different. It's not "what" you install but "who" installs
> it. I just follow the distinction introduced by Josh - if your
> corporate customers have two distinct types of users, "Admins" and
> "Users", I think you should follow this and introduce those two
> different types of users. When a package is added by Admin user, it
> should be added to both - webserver and worker/scheduler. If it is
> added by the "User" - then it is added only to 'webserver/scheduler".
> Then if the admins (I guess those are the ones who need to configure
> connections anyway) - if they need a "connection type", they could add
> the right provider themselves. Users will not be able to add them.
> That completely solves the problem that Josh mentioned, I believe.
> Please correct me if I am wrong.
>
> > On 6/18/21, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
> >
> >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> >     > That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
> >
> >     But in Airflow 2.0 the code provided by "DAG writers" is not executed
> >     any more.  This is entirely gone together with Airflow 1.10.  This has
> >     been handled by DAG serialization, which is the only option available
> >     in 2.0. I do not see how the "Users" could add any code if "Admins"
> >     control the packages that are installed in the webserver. Now if
> >     Admin/User is the only problem then I think this is really
> >     misunderstanding coming from the pre-DAG-serialization world of Apache
> >     Airflow.
> >
> >     J.
> >
>
>
> --
> +48 660 796 129



-- 
+48 660 796 129

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
Hey Subash, Jon

> Plugins, providers, and their associated Python libraries all need to execute code in order to be installed which is a vulnerability.  Plugins in particular are often developed/installed by the data engineers and not by system administrators, leading us back to our original problem.

@John - it's You who introduced the User/Admin separation and
reasoning. I think you should follow the logical consequence of it and
introduce different level of access for those two types of users to
manage the platform to address it. You can control who has access to
install things and where. You are managing the access control to be
able to reconfigure the MWAA already and I am sure you do not give
casual "users" the ability to control certain aspects of the platform.
I am sure you could restrict the ability to install packages on
Webserver to only admins and have it open also for users in the
"scheduler/worker". Is that not possible? It sounds like what you
really need from your description.

> @Jarek - you are right about the use/admin difference, it’s a disambiguation that permeates beyond the airflow UI layer in MWAA - IAM auth is used for determining authN and AuthZ, hence to secure the webserver from un-authorized code, we would have to either a/ treat plugin updates as an elevated permission activity, or b/ separate out the webserver intended requirements/plugins from the ones required for DAGs so that the authZ can be handled separately.

Correct. This is exactly what I propose. Have a separate
"providers/plugins' install which only admins can update. Any package
added by "Admin" is added to both webserver is added to both -
webserver and worker/shcheduler. If you want dag-only packages that
are needed by "Users" they can be only added to workers. Sounds pretty
straightforward.

> We stayed with the one-DAG-bad ideology to not add complexity to customers and coaching them on "if you add to A it goes here, and if B it goes to webserver". That’s is why we are now between rock and a hard place - not being to open up all installs into webserver OR separate the DAG bag for webserver and other entities.

No. This is different. It's not "what" you install but "who" installs
it. I just follow the distinction introduced by Josh - if your
corporate customers have two distinct types of users, "Admins" and
"Users", I think you should follow this and introduce those two
different types of users. When a package is added by Admin user, it
should be added to both - webserver and worker/scheduler. If it is
added by the "User" - then it is added only to 'webserver/scheduler".
Then if the admins (I guess those are the ones who need to configure
connections anyway) - if they need a "connection type", they could add
the right provider themselves. Users will not be able to add them.
That completely solves the problem that Josh mentioned, I believe.
Please correct me if I am wrong.

> On 6/18/21, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>     > That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.
>
>     But in Airflow 2.0 the code provided by "DAG writers" is not executed
>     any more.  This is entirely gone together with Airflow 1.10.  This has
>     been handled by DAG serialization, which is the only option available
>     in 2.0. I do not see how the "Users" could add any code if "Admins"
>     control the packages that are installed in the webserver. Now if
>     Admin/User is the only problem then I think this is really
>     misunderstanding coming from the pre-DAG-serialization world of Apache
>     Airflow.
>
>     J.
>


-- 
+48 660 796 129

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Canapathy, Subash" <su...@amazon.com.INVALID>.
@Jarek - you are right about the use/admin difference, it’s a disambiguation that permeates beyond the airflow UI layer in MWAA - IAM auth is used for determining authN and AuthZ, hence to secure the webserver from un-authorized code, we would have to either a/ treat plugin updates as an elevated permission activity, or b/ separate out the webserver intended requirements/plugins from the ones required for DAGs so that the authZ can be handled separately.

We stayed with the one-DAG-bad ideology to not add complexity to customers and coaching them on "if you add to A it goes here, and if B it goes to webserver". That’s is why we are now between rock and a hard place - not being to open up all installs into webserver OR separate the DAG bag for webserver and other entities.

On 6/18/21, 1:36 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    > That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.

    But in Airflow 2.0 the code provided by "DAG writers" is not executed
    any more.  This is entirely gone together with Airflow 1.10.  This has
    been handled by DAG serialization, which is the only option available
    in 2.0. I do not see how the "Users" could add any code if "Admins"
    control the packages that are installed in the webserver. Now if
    Admin/User is the only problem then I think this is really
    misunderstanding coming from the pre-DAG-serialization world of Apache
    Airflow.

    J.


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
> That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.

But in Airflow 2.0 the code provided by "DAG writers" is not executed
any more.  This is entirely gone together with Airflow 1.10.  This has
been handled by DAG serialization, which is the only option available
in 2.0. I do not see how the "Users" could add any code if "Admins"
control the packages that are installed in the webserver. Now if
Admin/User is the only problem then I think this is really
misunderstanding coming from the pre-DAG-serialization world of Apache
Airflow.

J.

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Jackson, John" <ja...@amazon.com.INVALID>.
That would certainly help a bit, but unfortunately it's not just the packages.  It's the fact that authentication is tied to Python code that can be patched by anyone with permission to execute code on the web server, which in turn would give them access to packages or any anything else they'd like.

What would be ideal is if the web server got its entire identity and capabilities from a central, secure source--think serialized DAGs, but for all Airflow UI extensions and configurations.  Then the UI is just that--providing UI--but does not contain any code that can be exploited.  It could be hardened and secured without impacting the extensible nature of Airflow.

On 2021-06-18, 1:00 PM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    > The only way to be 100% sure that users aren't changing the way the web server behaves is to not permit its alteration.  UI plugins, package installations, and library changes are among the various vulnerabilities that could be exploited.  For example, I could write a plugin that patches the auth functions and allows everyone Admin access regardless of their predetermined role.  Without strict security controls there will be a limit to Airflow adoption amongst Enterprise customers.  For Airflow to grow, it must offer a secure-by-design-friendly infrastructure.  Ideally the web server is a window into what Airflow is doing, but does not allow access or modification to any of the internal behaviour of the system.

    Just a comment on this one. If this is only user vs. admin, I think
    this can be easily solved by only allowing admin users to add packages
    for the webserver, not the dag writers. Will that solve the problem ?

    J.



    On Fri, Jun 18, 2021 at 9:45 PM Jackson, John
    <ja...@amazon.com.invalid> wrote:
    >
    > Hi Folks,
    >
    > Product Manager for MWAA weighing in here, having spoken to--quite literally--hundreds of Airflow customers (both MWAA and in general).
    >
    > Enterprise organizations--those that use Airflow at scale--typically separate their "Administrators" from their "Users".  The former sets up the security controls, and makes sure that users can't violate their organization's data security while still providing access to (often sensitive) data in order to accomplish their business goals.  The latter are the folks writing DAGs and monitoring their execution, and sometimes see those security controls as a hinderance to the ease at which they can write their data pipelines and orchestration.
    >
    > The weak spot in the security model is the web based user interface.  It needs to be accessible to users, sitting at their laptops, with relative ease but cannot be permitted to perform arbitrary tasks otherwise it can escape the bounds set to it.  Airflow is wonderful in that it's entirely written in Python and extensible.  However, that same ease of extensibility could easily be used to bypass the Administrator's security controls, such as auth plugins, and allow users access beyond which they should rightfully have (whether deliberately or by accident).
    >
    > The only way to be 100% sure that users aren't changing the way the web server behaves is to not permit its alteration.  UI plugins, package installations, and library changes are among the various vulnerabilities that could be exploited.  For example, I could write a plugin that patches the auth functions and allows everyone Admin access regardless of their predetermined role.  Without strict security controls there will be a limit to Airflow adoption amongst Enterprise customers.  For Airflow to grow, it must offer a secure-by-design-friendly infrastructure.  Ideally the web server is a window into what Airflow is doing, but does not allow access or modification to any of the internal behaviour of the system.
    >
    > Should there be some sort of signed and verified packages in the future, perhaps organizations will be more open to extensibility.  However, the "shared responsibility model" does not allow service providers, be it Astronomer, Google, AWS, or anyone else, to be cavalier with customers security concerns and must always default to the strictest security defaults possible.  Customers look to managed services to provide guard rails that prevent them from data breaches while still benefiting from the features and capabilities of the software platform.
    >
    > Cheers,
    >
    > John
    >
    > On 2021-06-18, 11:40 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
    >
    >     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >
    >
    >
    >     I agree that this thread is probably not good for categorization of
    >     the offering but I also concur with Ash to get a better understanding
    >     of the risks involved.
    >
    >     I think I "feel" where it comes from and intuitively see that you
    >     might want to add additional or extra layers of precautions (and
    >     likely follow pressures from the internal security teams) but also
    >     Ash's point is quite important. We should get to the bottom of it, and
    >     if there are some real threats that we are not aware of, I think
    >     sharing details on private@airflow.apache.org is the right thing to
    >     do.
    >
    >     Maybe we will find that other users of Airflow are also at risk and we
    >     might want to protect them (and also all managed services but also
    >     individual installations) in the future by introducing some changes in
    >     this model.
    >
    >     BTW. Subash - you do not need to have a subscription to write to
    >     private@airflow.apache.org. Just send an -email with the details and
    >     we will get it and we will be able to keep you in discussion when it
    >     follows. Also information for your security team
    >     https://www.apache.org/dev/pmc.html#mailing-list-private . One of the
    >     main purposes of the private@ mailing list is pre-disclosing security
    >     problems related to the project. And we are all obliged as PMCs (and
    >     all ASF members who read the list as well) to not disclose what is
    >     discussed there.
    >
    >     J,
    >
    >     On Fri, Jun 18, 2021 at 4:04 PM Ash Berlin-Taylor <as...@apache.org> wrote:
    >     >
    >     > No one as yet explained what the security concerns actually are? Is there some concrete thing that is a worry, is it merely a concern that more things installed = marginally more risky?
    >     >
    >     > The blast radius is limited to a single Airflow deployment, and access is I assume sufficiently gated behind IAM perms anyway?
    >     >
    >     > By not letting users install extra modules in to the webserver image you are also removing their ability to use third party providers, such as these
    >     >
    >     > https://github.com/great-expectations/airflow-provider-great-expectations
    >     > https://github.com/fivetran/airflow-provider-fivetran
    >     > https://github.com/anyscale/airflow-provider-ray
    >     >
    >     > -- and there are only going to be more of these over time.
    >     >
    >     > Not to mention this blocks UI plugins entirely.
    >     >
    >     > I don't quite understand why MWAA concerns itself with exactly what is being installed in the webserver image on top of Airflow -- the Amazon Shared Responsibility model would I think already cover the "AWS takes care of the base, 'you' take care of what is running" (but I confess I haven't re-read it in a number of years)
    >     >
    >     > -ash
    >     >
    >     > On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
    >     >
    >     > Irrespective of personal categorization of the managed offerings Airflow-ness, there are obligations to adhere to a security bar and securing against any attack vectors a UI feature can introduce – and this will be true for any cloud service provider. I want to clarify that we were not suggesting to change any assumptions in current way of packaging providers but merely citing that we cannot use equivalence to earlier mono repo and add all 60+ of them on base image.
    >     >
    >     >
    >     >
    >     > Going back to the original discussion, we are in the process of pre-installing providers with Apache 2 license right away and others will be added (with approved exception) based on user demand.
    >     >
    >     >
    >     >
    >     > From: Ash Berlin-Taylor <as...@apache.org>
    >     > Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
    >     > Date: Wednesday, June 16, 2021 at 1:11 AM
    >     > To: "dev@airflow.apache.org" <de...@airflow.apache.org>
    >     > Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services
    >     >
    >     >
    >     >
    >     > On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
    >     >
    >     > Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime.
    >     >
    >     >
    >     >
    >     > I'm sorry, I don't agree with this summary.
    >     >
    >     >
    >     >
    >     > Airflow's job is to run user submitted code, and to allow the UI to be pluggable.
    >     >
    >     >
    >     >
    >     > Are you providing Airflow, or an Airflow like service?
    >
    >
    >
    >     --
    >     +48 660 796 129
    >


    --
    +48 660 796 129


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
> The only way to be 100% sure that users aren't changing the way the web server behaves is to not permit its alteration.  UI plugins, package installations, and library changes are among the various vulnerabilities that could be exploited.  For example, I could write a plugin that patches the auth functions and allows everyone Admin access regardless of their predetermined role.  Without strict security controls there will be a limit to Airflow adoption amongst Enterprise customers.  For Airflow to grow, it must offer a secure-by-design-friendly infrastructure.  Ideally the web server is a window into what Airflow is doing, but does not allow access or modification to any of the internal behaviour of the system.

Just a comment on this one. If this is only user vs. admin, I think
this can be easily solved by only allowing admin users to add packages
for the webserver, not the dag writers. Will that solve the problem ?

J.



On Fri, Jun 18, 2021 at 9:45 PM Jackson, John
<ja...@amazon.com.invalid> wrote:
>
> Hi Folks,
>
> Product Manager for MWAA weighing in here, having spoken to--quite literally--hundreds of Airflow customers (both MWAA and in general).
>
> Enterprise organizations--those that use Airflow at scale--typically separate their "Administrators" from their "Users".  The former sets up the security controls, and makes sure that users can't violate their organization's data security while still providing access to (often sensitive) data in order to accomplish their business goals.  The latter are the folks writing DAGs and monitoring their execution, and sometimes see those security controls as a hinderance to the ease at which they can write their data pipelines and orchestration.
>
> The weak spot in the security model is the web based user interface.  It needs to be accessible to users, sitting at their laptops, with relative ease but cannot be permitted to perform arbitrary tasks otherwise it can escape the bounds set to it.  Airflow is wonderful in that it's entirely written in Python and extensible.  However, that same ease of extensibility could easily be used to bypass the Administrator's security controls, such as auth plugins, and allow users access beyond which they should rightfully have (whether deliberately or by accident).
>
> The only way to be 100% sure that users aren't changing the way the web server behaves is to not permit its alteration.  UI plugins, package installations, and library changes are among the various vulnerabilities that could be exploited.  For example, I could write a plugin that patches the auth functions and allows everyone Admin access regardless of their predetermined role.  Without strict security controls there will be a limit to Airflow adoption amongst Enterprise customers.  For Airflow to grow, it must offer a secure-by-design-friendly infrastructure.  Ideally the web server is a window into what Airflow is doing, but does not allow access or modification to any of the internal behaviour of the system.
>
> Should there be some sort of signed and verified packages in the future, perhaps organizations will be more open to extensibility.  However, the "shared responsibility model" does not allow service providers, be it Astronomer, Google, AWS, or anyone else, to be cavalier with customers security concerns and must always default to the strictest security defaults possible.  Customers look to managed services to provide guard rails that prevent them from data breaches while still benefiting from the features and capabilities of the software platform.
>
> Cheers,
>
> John
>
> On 2021-06-18, 11:40 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>     I agree that this thread is probably not good for categorization of
>     the offering but I also concur with Ash to get a better understanding
>     of the risks involved.
>
>     I think I "feel" where it comes from and intuitively see that you
>     might want to add additional or extra layers of precautions (and
>     likely follow pressures from the internal security teams) but also
>     Ash's point is quite important. We should get to the bottom of it, and
>     if there are some real threats that we are not aware of, I think
>     sharing details on private@airflow.apache.org is the right thing to
>     do.
>
>     Maybe we will find that other users of Airflow are also at risk and we
>     might want to protect them (and also all managed services but also
>     individual installations) in the future by introducing some changes in
>     this model.
>
>     BTW. Subash - you do not need to have a subscription to write to
>     private@airflow.apache.org. Just send an -email with the details and
>     we will get it and we will be able to keep you in discussion when it
>     follows. Also information for your security team
>     https://www.apache.org/dev/pmc.html#mailing-list-private . One of the
>     main purposes of the private@ mailing list is pre-disclosing security
>     problems related to the project. And we are all obliged as PMCs (and
>     all ASF members who read the list as well) to not disclose what is
>     discussed there.
>
>     J,
>
>     On Fri, Jun 18, 2021 at 4:04 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>     >
>     > No one as yet explained what the security concerns actually are? Is there some concrete thing that is a worry, is it merely a concern that more things installed = marginally more risky?
>     >
>     > The blast radius is limited to a single Airflow deployment, and access is I assume sufficiently gated behind IAM perms anyway?
>     >
>     > By not letting users install extra modules in to the webserver image you are also removing their ability to use third party providers, such as these
>     >
>     > https://github.com/great-expectations/airflow-provider-great-expectations
>     > https://github.com/fivetran/airflow-provider-fivetran
>     > https://github.com/anyscale/airflow-provider-ray
>     >
>     > -- and there are only going to be more of these over time.
>     >
>     > Not to mention this blocks UI plugins entirely.
>     >
>     > I don't quite understand why MWAA concerns itself with exactly what is being installed in the webserver image on top of Airflow -- the Amazon Shared Responsibility model would I think already cover the "AWS takes care of the base, 'you' take care of what is running" (but I confess I haven't re-read it in a number of years)
>     >
>     > -ash
>     >
>     > On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
>     >
>     > Irrespective of personal categorization of the managed offerings Airflow-ness, there are obligations to adhere to a security bar and securing against any attack vectors a UI feature can introduce – and this will be true for any cloud service provider. I want to clarify that we were not suggesting to change any assumptions in current way of packaging providers but merely citing that we cannot use equivalence to earlier mono repo and add all 60+ of them on base image.
>     >
>     >
>     >
>     > Going back to the original discussion, we are in the process of pre-installing providers with Apache 2 license right away and others will be added (with approved exception) based on user demand.
>     >
>     >
>     >
>     > From: Ash Berlin-Taylor <as...@apache.org>
>     > Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
>     > Date: Wednesday, June 16, 2021 at 1:11 AM
>     > To: "dev@airflow.apache.org" <de...@airflow.apache.org>
>     > Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services
>     >
>     >
>     >
>     > On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
>     >
>     > Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime.
>     >
>     >
>     >
>     > I'm sorry, I don't agree with this summary.
>     >
>     >
>     >
>     > Airflow's job is to run user submitted code, and to allow the UI to be pluggable.
>     >
>     >
>     >
>     > Are you providing Airflow, or an Airflow like service?
>
>
>
>     --
>     +48 660 796 129
>


-- 
+48 660 796 129

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Jackson, John" <ja...@amazon.com.INVALID>.
Hi Folks,

Product Manager for MWAA weighing in here, having spoken to--quite literally--hundreds of Airflow customers (both MWAA and in general).

Enterprise organizations--those that use Airflow at scale--typically separate their "Administrators" from their "Users".  The former sets up the security controls, and makes sure that users can't violate their organization's data security while still providing access to (often sensitive) data in order to accomplish their business goals.  The latter are the folks writing DAGs and monitoring their execution, and sometimes see those security controls as a hinderance to the ease at which they can write their data pipelines and orchestration.

The weak spot in the security model is the web based user interface.  It needs to be accessible to users, sitting at their laptops, with relative ease but cannot be permitted to perform arbitrary tasks otherwise it can escape the bounds set to it.  Airflow is wonderful in that it's entirely written in Python and extensible.  However, that same ease of extensibility could easily be used to bypass the Administrator's security controls, such as auth plugins, and allow users access beyond which they should rightfully have (whether deliberately or by accident).

The only way to be 100% sure that users aren't changing the way the web server behaves is to not permit its alteration.  UI plugins, package installations, and library changes are among the various vulnerabilities that could be exploited.  For example, I could write a plugin that patches the auth functions and allows everyone Admin access regardless of their predetermined role.  Without strict security controls there will be a limit to Airflow adoption amongst Enterprise customers.  For Airflow to grow, it must offer a secure-by-design-friendly infrastructure.  Ideally the web server is a window into what Airflow is doing, but does not allow access or modification to any of the internal behaviour of the system.  

Should there be some sort of signed and verified packages in the future, perhaps organizations will be more open to extensibility.  However, the "shared responsibility model" does not allow service providers, be it Astronomer, Google, AWS, or anyone else, to be cavalier with customers security concerns and must always default to the strictest security defaults possible.  Customers look to managed services to provide guard rails that prevent them from data breaches while still benefiting from the features and capabilities of the software platform.

Cheers,

John

On 2021-06-18, 11:40 AM, "Jarek Potiuk" <ja...@potiuk.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    I agree that this thread is probably not good for categorization of
    the offering but I also concur with Ash to get a better understanding
    of the risks involved.

    I think I "feel" where it comes from and intuitively see that you
    might want to add additional or extra layers of precautions (and
    likely follow pressures from the internal security teams) but also
    Ash's point is quite important. We should get to the bottom of it, and
    if there are some real threats that we are not aware of, I think
    sharing details on private@airflow.apache.org is the right thing to
    do.

    Maybe we will find that other users of Airflow are also at risk and we
    might want to protect them (and also all managed services but also
    individual installations) in the future by introducing some changes in
    this model.

    BTW. Subash - you do not need to have a subscription to write to
    private@airflow.apache.org. Just send an -email with the details and
    we will get it and we will be able to keep you in discussion when it
    follows. Also information for your security team
    https://www.apache.org/dev/pmc.html#mailing-list-private . One of the
    main purposes of the private@ mailing list is pre-disclosing security
    problems related to the project. And we are all obliged as PMCs (and
    all ASF members who read the list as well) to not disclose what is
    discussed there.

    J,

    On Fri, Jun 18, 2021 at 4:04 PM Ash Berlin-Taylor <as...@apache.org> wrote:
    >
    > No one as yet explained what the security concerns actually are? Is there some concrete thing that is a worry, is it merely a concern that more things installed = marginally more risky?
    >
    > The blast radius is limited to a single Airflow deployment, and access is I assume sufficiently gated behind IAM perms anyway?
    >
    > By not letting users install extra modules in to the webserver image you are also removing their ability to use third party providers, such as these
    >
    > https://github.com/great-expectations/airflow-provider-great-expectations
    > https://github.com/fivetran/airflow-provider-fivetran
    > https://github.com/anyscale/airflow-provider-ray
    >
    > -- and there are only going to be more of these over time.
    >
    > Not to mention this blocks UI plugins entirely.
    >
    > I don't quite understand why MWAA concerns itself with exactly what is being installed in the webserver image on top of Airflow -- the Amazon Shared Responsibility model would I think already cover the "AWS takes care of the base, 'you' take care of what is running" (but I confess I haven't re-read it in a number of years)
    >
    > -ash
    >
    > On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
    >
    > Irrespective of personal categorization of the managed offerings Airflow-ness, there are obligations to adhere to a security bar and securing against any attack vectors a UI feature can introduce – and this will be true for any cloud service provider. I want to clarify that we were not suggesting to change any assumptions in current way of packaging providers but merely citing that we cannot use equivalence to earlier mono repo and add all 60+ of them on base image.
    >
    >
    >
    > Going back to the original discussion, we are in the process of pre-installing providers with Apache 2 license right away and others will be added (with approved exception) based on user demand.
    >
    >
    >
    > From: Ash Berlin-Taylor <as...@apache.org>
    > Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
    > Date: Wednesday, June 16, 2021 at 1:11 AM
    > To: "dev@airflow.apache.org" <de...@airflow.apache.org>
    > Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services
    >
    >
    >
    > On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
    >
    > Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime.
    >
    >
    >
    > I'm sorry, I don't agree with this summary.
    >
    >
    >
    > Airflow's job is to run user submitted code, and to allow the UI to be pluggable.
    >
    >
    >
    > Are you providing Airflow, or an Airflow like service?



    --
    +48 660 796 129


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Jarek Potiuk <ja...@potiuk.com>.
I agree that this thread is probably not good for categorization of
the offering but I also concur with Ash to get a better understanding
of the risks involved.

I think I "feel" where it comes from and intuitively see that you
might want to add additional or extra layers of precautions (and
likely follow pressures from the internal security teams) but also
Ash's point is quite important. We should get to the bottom of it, and
if there are some real threats that we are not aware of, I think
sharing details on private@airflow.apache.org is the right thing to
do.

Maybe we will find that other users of Airflow are also at risk and we
might want to protect them (and also all managed services but also
individual installations) in the future by introducing some changes in
this model.

BTW. Subash - you do not need to have a subscription to write to
private@airflow.apache.org. Just send an -email with the details and
we will get it and we will be able to keep you in discussion when it
follows. Also information for your security team
https://www.apache.org/dev/pmc.html#mailing-list-private . One of the
main purposes of the private@ mailing list is pre-disclosing security
problems related to the project. And we are all obliged as PMCs (and
all ASF members who read the list as well) to not disclose what is
discussed there.

J,

On Fri, Jun 18, 2021 at 4:04 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> No one as yet explained what the security concerns actually are? Is there some concrete thing that is a worry, is it merely a concern that more things installed = marginally more risky?
>
> The blast radius is limited to a single Airflow deployment, and access is I assume sufficiently gated behind IAM perms anyway?
>
> By not letting users install extra modules in to the webserver image you are also removing their ability to use third party providers, such as these
>
> https://github.com/great-expectations/airflow-provider-great-expectations
> https://github.com/fivetran/airflow-provider-fivetran
> https://github.com/anyscale/airflow-provider-ray
>
> -- and there are only going to be more of these over time.
>
> Not to mention this blocks UI plugins entirely.
>
> I don't quite understand why MWAA concerns itself with exactly what is being installed in the webserver image on top of Airflow -- the Amazon Shared Responsibility model would I think already cover the "AWS takes care of the base, 'you' take care of what is running" (but I confess I haven't re-read it in a number of years)
>
> -ash
>
> On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
>
> Irrespective of personal categorization of the managed offerings Airflow-ness, there are obligations to adhere to a security bar and securing against any attack vectors a UI feature can introduce – and this will be true for any cloud service provider. I want to clarify that we were not suggesting to change any assumptions in current way of packaging providers but merely citing that we cannot use equivalence to earlier mono repo and add all 60+ of them on base image.
>
>
>
> Going back to the original discussion, we are in the process of pre-installing providers with Apache 2 license right away and others will be added (with approved exception) based on user demand.
>
>
>
> From: Ash Berlin-Taylor <as...@apache.org>
> Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
> Date: Wednesday, June 16, 2021 at 1:11 AM
> To: "dev@airflow.apache.org" <de...@airflow.apache.org>
> Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services
>
>
>
> On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:
>
> Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime.
>
>
>
> I'm sorry, I don't agree with this summary.
>
>
>
> Airflow's job is to run user submitted code, and to allow the UI to be pluggable.
>
>
>
> Are you providing Airflow, or an Airflow like service?



-- 
+48 660 796 129

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
No one as yet explained what the security concerns actually are? Is 
there some concrete thing that is a worry, is it merely a concern that 
more things installed = marginally more risky?

The blast radius is limited to a single Airflow deployment, and access 
is I assume sufficiently gated behind IAM perms anyway?

By not letting users install extra modules in to the webserver image 
you are also removing their ability to use third party providers, such 
as these

<https://github.com/great-expectations/airflow-provider-great-expectations>
<https://github.com/fivetran/airflow-provider-fivetran>
<https://github.com/anyscale/airflow-provider-ray>

-- and there are only going to be more of these over time.

Not to mention this blocks UI plugins entirely.

I don't quite understand why MWAA concerns itself with exactly what is 
being installed in the webserver image on top of Airflow -- the Amazon 
Shared Responsibility model would I think already cover the "AWS takes 
care of the base, 'you' take care of what is running" (but I confess I 
haven't re-read it in a number of years)

-ash

On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" 
<su...@amazon.com.INVALID> wrote:
> Irrespective of personal categorization of the managed offerings 
> Airflow-ness, there are obligations to adhere to a security bar and 
> securing against any attack vectors a UI feature can introduce – 
> and this will be true for any cloud service provider. I want to 
> clarify that we were not suggesting to change any assumptions in 
> current way of packaging providers but merely citing that we cannot 
> use equivalence to earlier mono repo and add all 60+ of them on base 
> image.
> 
> 
> 
> Going back to the original discussion, we are in the process of 
> pre-installing providers with Apache 2 license right away and others 
> will be added (with approved exception) based on user demand.
> 
> 
> 
> *From:*Ash Berlin-Taylor <as...@apache.org>
> *Reply-To:*"dev@airflow.apache.org" <de...@airflow.apache.org>
> *Date:*Wednesday, June 16, 2021 at 1:11 AM
> *To:*"dev@airflow.apache.org" <de...@airflow.apache.org>
> *Subject:*RE: [EXTERNAL] [DISCUSS] Managing provider Connections via 
> UI in managed Airflow services
> 
> 
> 
> On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" 
> <su...@amazon.com.INVALID> wrote:
> 
>> Regarding security constraints on why we disallow plugins and 
>> requirements on the webserver, I will have to discuss this in person 
>> on PMC but on a high level this comes down to remote code execution 
>> prevention on managed instances, opening possibilities of exploiting 
>> vulnerabilities on the flask-app-builder and the underlying python 
>> runtime.
>> 
> 
> 
> I'm sorry, I don't agree with this summary.
> 
> 
> 
> Airflow's job is to run user submitted code, and to allow the UI to 
> be pluggable.
> 
> 
> 
> Are you providing Airflow, or an Airflow like service?
> 


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Canapathy, Subash" <su...@amazon.com.INVALID>.
Irrespective of personal categorization of the managed offerings Airflow-ness, there are obligations to adhere to a security bar and securing against any attack vectors a UI feature can introduce – and this will be true for any cloud service provider. I want to clarify that we were not suggesting to change any assumptions in current way of packaging providers but merely citing that we cannot use equivalence to earlier mono repo and add all 60+ of them on base image.

Going back to the original discussion, we are in the process of pre-installing providers with Apache 2 license right away and others will be added (with approved exception) based on user demand.

From: Ash Berlin-Taylor <as...@apache.org>
Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Date: Wednesday, June 16, 2021 at 1:11 AM
To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" <su...@amazon.com.INVALID> wrote:

Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime.

I'm sorry, I don't agree with this summary.

Airflow's job is to run user submitted code, and to allow the UI to be pluggable.

Are you providing Airflow, or an Airflow like service?

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" 
<su...@amazon.com.INVALID> wrote:
> Regarding security constraints on why we disallow plugins and 
> requirements on the webserver, I will have to discuss this in person 
> on PMC but on a high level this comes down to remote code execution 
> prevention on managed instances, opening possibilities of exploiting 
> vulnerabilities on the flask-app-builder and the underlying python 
> runtime.

I'm sorry, I don't agree with this summary.

Airflow's job is to run user submitted code, and to allow the UI to be 
pluggable.

Are you providing Airflow, or an Airflow like service?


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by "Canapathy, Subash" <su...@amazon.com.INVALID>.
@Ash Berlin-Taylor<ma...@apache.org> – I don’t think that is entirely true. In 1.10 the connection templates code was part of the flask application and not bundled with the provider. Managed services took the webserver baseline as is and let the customers take decision on additives like FB-business, oracle etc.. without bundling them into the managed service software per AWS compliance guidelines. In 2.0 if we bake in all the providers, it will mean that we are baking in their dependencies along with.

Eg: search for “facebook-business” as an example in the following files
1.10 constraints file – https://github.com/apache/airflow/blob/constraints-1-10/constraints-3.7.txt (does not have facebook-business as dependency)
2.0 constraints file - https://github.com/apache/airflow/blob/constraints-2-0/constraints-3.7.txt (this contains facebook-business as dependency)

This is one example, I can pull in other LGPL ones similarly. The point is that the connections code from flask app now lives elsewhere and therein bringing in the requirements for everything related to the provider as one package.

Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime. There is 2 levels of isolation – one on the single tenancy of environments in MWAA under separate VPCs, and secondly on Fargate that prevents exploits to break out of the container boundaries into the hypervisor. Even with those, our security team had other possibilities of exploits unearthed in penetration testing that led to this decision.

Thanks
Subash

From: Ash Berlin-Taylor <as...@apache.org>
Reply-To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Date: Tuesday, June 15, 2021 at 7:18 AM
To: "dev@airflow.apache.org" <de...@airflow.apache.org>
Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi Subash,

If your concern is about licensing then you have a false sense of compliance  in 1.10 -- the dependencies for the "providers" between 1.10 and 2.0 haven't really changed -- the same (L)GPL, Facebook etc licensed modules are still there in 1.10 and the 2.0 providers.

My same question to you: can iterate (private if necessary) what the security concerns here are?

-ash


On Mon, Jun 14 2021 at 19:59:25 -0000, Subash Canapathy <su...@gmail.com> wrote:

Hi Jarek Thank you for surfacing this issue on a discussion. The major hurdle for managed services apart from the security constraints is on the licensing side. Previously when the code needed for connection templates was part of Airflow, we were able to bundle them as a solution as the code was under the Apache v2 license. Now that we have them separated out as provider packages, those come with dependencies that do not have "blessed" licenses that allow bundling them into managed service. I am sure GCP folks have similar restrictions on why they cannot add all 60+ providers as is into the base image. We recently did the manual exercise to assess each of those provider package and their dependencies, and only 20 of them made the cut for not having to use additional licenses like Facebook license, LGPL etc. Thanks Subash Canapathy On 2021/06/14 16:28:46, Ash Berlin-Taylor <as...@apache.org>> wrote:
Can you elaborate (privately if you have to) on what the security concerns are? Since as I understand it the web server is powery deployment, so anything should be limited to one customer/user/deployment. There is also the new "test connection" feature that will need the provider code installed to work. Then there's the issue of third party connections - of which there is only going to be more of over time. -ash On 14 June 2021 16:35:42 BST, Eugen Kosteev <eu...@kosteev.com>> wrote: >Hi Jarek. > >Thanks for the discussion. >The issue with Connections management in the web server that you described >is indeed affected Cloud Composer in the released preview image versions of >Airflow 2.0.1 (link to public issue >https://issuetracker.google.com/issues/190189297). And as you stated, we do >not install pypi packages in web server image mostly because of security >concerns. > >As a temporary workaround we baked all connections (list of them with their >widgets pickled and stored inside) into a web server image, so that >customers can add/edit them (even though not all providers packages are >pre-installed). This is a temporary workaround that we came up with for now >and we are looking for a long-term solution. > >Our thoughts/ideas for alternative solutions: >1. We do not want to pre-install all providers packages as to not generate >unnecessary python dependencies. Or maybe we could do this only for web >server images (not scheduler/worker) but then it is not clear if this is a >good idea to have such occured discrepancy between pypi dependencies in web >server vs scheduler/worker images. >2. Downloading and backing in providers packages (wheel files) into docker >image and installing customer specific/required version on demand looks >infeasible, taking into account number of providers, their versions and >their dependencies. > >- Eugene > >On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <ja...@potiuk.com>> wrote: > >> Dear Airflow community, >> >> Here is another result of discussions. I would like to raise an attention >> to potential Connection management problems that might affect managed >> services for Airflow 2.0 and some providers. >> >> With Airflow 2.0, connection UI "customisations" are baked into the >> provider package and in order to see - for example Postgres connection in >> the UI, you need to have the "postgres" provider installed in the Webserver. >> >> As far as I know some of the Managed Airflow services (MWAA, Composer, >> possibly other) do not currently allow their users installation of >> additional packages in the webserver (the webserver container is different >> than the scheduler/worker). This makes it impossible to configure/edit >> provider connections via UI (unless those providers are pre-installed in >> the webserver image). >> >> While this is understandable from security point of view to forbid "any'' >> package installation, I think the official >> "apache-airlfow-providers-*" should be allowlisted for those images and >> installed or otherwise made available (for example via pre-installing all >> providers in the webserver image if this is not possible from security >> point of view to rebuild the image dynamically) >> >> I wonder what people (and especially the people from MWAA, Composer team) >> think about it - do I get it right about the security concerns? Any other >> comments? >> >> >> J. >> >> -- >> +48 660 796 129 >> > > >-- >Eugene

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Sumit Maheshwari <ms...@apache.org>.
A question for AWS & GCP folks, how would a user be able to use some custom
dependencies (pip packages) in their setups, especially for the previous
Airflow versions, where the Airflow webserver would need all the packages.
If there is a way to solve that, then the same can be used for providers as
well?

On Tue, Jun 15, 2021 at 7:46 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Hi Subash,
>
> If your concern is about licensing then you have a false sense of
> compliance  in 1.10 -- the dependencies for the "providers" between 1.10
> and 2.0 haven't really changed -- the same (L)GPL, Facebook etc licensed
> modules are still there in 1.10 and the 2.0 providers.
>
> My same question to you: can iterate (private if necessary) what the
> security concerns here are?
>
> -ash
>
>
> On Mon, Jun 14 2021 at 19:59:25 -0000, Subash Canapathy <
> subash.canapathy@gmail.com> wrote:
>
> Hi Jarek Thank you for surfacing this issue on a discussion. The major
> hurdle for managed services apart from the security constraints is on the
> licensing side. Previously when the code needed for connection templates
> was part of Airflow, we were able to bundle them as a solution as the code
> was under the Apache v2 license. Now that we have them separated out as
> provider packages, those come with dependencies that do not have "blessed"
> licenses that allow bundling them into managed service. I am sure GCP folks
> have similar restrictions on why they cannot add all 60+ providers as is
> into the base image. We recently did the manual exercise to assess each of
> those provider package and their dependencies, and only 20 of them made the
> cut for not having to use additional licenses like Facebook license, LGPL
> etc. Thanks Subash Canapathy On 2021/06/14 16:28:46, Ash Berlin-Taylor <
> ash@apache.org> wrote:
>
> Can you elaborate (privately if you have to) on what the security concerns
> are? Since as I understand it the web server is powery deployment, so
> anything should be limited to one customer/user/deployment. There is also
> the new "test connection" feature that will need the provider code
> installed to work. Then there's the issue of third party connections - of
> which there is only going to be more of over time. -ash On 14 June 2021
> 16:35:42 BST, Eugen Kosteev <eu...@kosteev.com> wrote: >Hi Jarek. >
> >Thanks for the discussion. >The issue with Connections management in the
> web server that you described >is indeed affected Cloud Composer in the
> released preview image versions of >Airflow 2.0.1 (link to public issue >
> https://issuetracker.google.com/issues/190189297). And as you stated, we
> do >not install pypi packages in web server image mostly because of
> security >concerns. > >As a temporary workaround we baked all connections
> (list of them with their >widgets pickled and stored inside) into a web
> server image, so that >customers can add/edit them (even though not all
> providers packages are >pre-installed). This is a temporary workaround that
> we came up with for now >and we are looking for a long-term solution. >
> >Our thoughts/ideas for alternative solutions: >1. We do not want to
> pre-install all providers packages as to not generate >unnecessary python
> dependencies. Or maybe we could do this only for web >server images (not
> scheduler/worker) but then it is not clear if this is a >good idea to have
> such occured discrepancy between pypi dependencies in web >server vs
> scheduler/worker images. >2. Downloading and backing in providers packages
> (wheel files) into docker >image and installing customer specific/required
> version on demand looks >infeasible, taking into account number of
> providers, their versions and >their dependencies. > >- Eugene > >On Sun,
> Jun 13, 2021 at 6:46 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Dear
> Airflow community, >> >> Here is another result of discussions. I would
> like to raise an attention >> to potential Connection management problems
> that might affect managed >> services for Airflow 2.0 and some providers.
> >> >> With Airflow 2.0, connection UI "customisations" are baked into the
> >> provider package and in order to see - for example Postgres connection
> in >> the UI, you need to have the "postgres" provider installed in the
> Webserver. >> >> As far as I know some of the Managed Airflow services
> (MWAA, Composer, >> possibly other) do not currently allow their users
> installation of >> additional packages in the webserver (the webserver
> container is different >> than the scheduler/worker). This makes it
> impossible to configure/edit >> provider connections via UI (unless those
> providers are pre-installed in >> the webserver image). >> >> While this is
> understandable from security point of view to forbid "any'' >> package
> installation, I think the official >> "apache-airlfow-providers-*" should
> be allowlisted for those images and >> installed or otherwise made
> available (for example via pre-installing all >> providers in the webserver
> image if this is not possible from security >> point of view to rebuild the
> image dynamically) >> >> I wonder what people (and especially the people
> from MWAA, Composer team) >> think about it - do I get it right about the
> security concerns? Any other >> comments? >> >> >> J. >> >> -- >> +48 660
> 796 129 >> > > >-- >Eugene
>
>

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
Hi Subash,

If your concern is about licensing then you have a false sense of 
compliance  in 1.10 -- the dependencies for the "providers" between 
1.10 and 2.0 haven't really changed -- the same (L)GPL, Facebook etc 
licensed modules are still there in 1.10 and the 2.0 providers.

My same question to you: can iterate (private if necessary) what the 
security concerns here are?

-ash


On Mon, Jun 14 2021 at 19:59:25 -0000, Subash Canapathy 
<su...@gmail.com> wrote:
> Hi Jarek
> 
> Thank you for surfacing this issue on a discussion. The major hurdle 
> for managed services apart from the security constraints is on the 
> licensing side. Previously when the code needed for connection 
> templates was part of Airflow, we were able to bundle them as a 
> solution as the code was under the Apache v2 license. Now that we 
> have them separated out as provider packages, those come with 
> dependencies that do not have "blessed" licenses that allow bundling 
> them into managed service. I am sure GCP folks have similar 
> restrictions on why they cannot add all 60+ providers as is into the 
> base image.
> 
> We recently did the manual exercise to assess each of those provider 
> package and their dependencies, and only 20 of them made the cut for 
> not having to use additional licenses like Facebook license, LGPL etc.
> 
> Thanks
> Subash Canapathy
> 
> On 2021/06/14 16:28:46, Ash Berlin-Taylor <ash@apache.org 
> <ma...@apache.org>> wrote:
>>  Can you elaborate (privately if you have to) on what the security 
>> concerns are? Since as I understand it the web server is powery 
>> deployment, so anything should be limited to one 
>> customer/user/deployment.
>> 
>>  There is also the new "test connection" feature that will need the 
>> provider code installed to work.
>> 
>>  Then there's the issue of third party connections - of which there 
>> is only going to be more of over time.
>> 
>>  -ash
>> 
>>  On 14 June 2021 16:35:42 BST, Eugen Kosteev <eugen@kosteev.com 
>> <ma...@kosteev.com>> wrote:
>>  >Hi Jarek.
>>  >
>>  >Thanks for the discussion.
>>  >The issue with Connections management in the web server that you 
>> described
>>  >is indeed affected Cloud Composer in the released preview image 
>> versions of
>>  >Airflow 2.0.1 (link to public issue
>>  ><https://issuetracker.google.com/issues/190189297>). And as you 
>> stated, we do
>>  >not install pypi packages in web server image mostly because of 
>> security
>>  >concerns.
>>  >
>>  >As a temporary workaround we baked all connections (list of them 
>> with their
>>  >widgets pickled and stored inside) into a web server image, so that
>>  >customers can add/edit them (even though not all providers 
>> packages are
>>  >pre-installed). This is a temporary workaround that we came up 
>> with for now
>>  >and we are looking for a long-term solution.
>>  >
>>  >Our thoughts/ideas for alternative solutions:
>>  >1. We do not want to pre-install all providers packages as to not 
>> generate
>>  >unnecessary python dependencies. Or maybe we could do this only 
>> for web
>>  >server images (not scheduler/worker) but then it is not clear if 
>> this is a
>>  >good idea to have such occured discrepancy between pypi 
>> dependencies in web
>>  >server vs scheduler/worker images.
>>  >2. Downloading and backing in providers packages (wheel files) 
>> into docker
>>  >image and installing customer specific/required version on demand 
>> looks
>>  >infeasible, taking into account number of providers, their 
>> versions and
>>  >their dependencies.
>>  >
>>  >- Eugene
>>  >
>>  >On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <jarek@potiuk.com 
>> <ma...@potiuk.com>> wrote:
>>  >
>>  >> Dear Airflow community,
>>  >>
>>  >> Here is another result of discussions. I would like to raise an 
>> attention
>>  >> to potential Connection management problems that might affect 
>> managed
>>  >> services for Airflow 2.0 and some providers.
>>  >>
>>  >> With Airflow 2.0, connection UI "customisations" are baked into 
>> the
>>  >> provider package and in order to see - for example Postgres 
>> connection in
>>  >> the UI, you need to have the "postgres" provider installed in 
>> the Webserver.
>>  >>
>>  >> As far as I know some of the Managed Airflow services (MWAA, 
>> Composer,
>>  >> possibly other) do not currently allow their users installation 
>> of
>>  >> additional packages in the webserver (the webserver container is 
>> different
>>  >> than the scheduler/worker). This makes it impossible to 
>> configure/edit
>>  >> provider connections via UI (unless those providers are 
>> pre-installed in
>>  >> the webserver image).
>>  >>
>>  >> While this is understandable from security point of view to 
>> forbid "any''
>>  >> package installation, I think the official
>>  >> "apache-airlfow-providers-*" should be allowlisted for those 
>> images and
>>  >> installed or otherwise made available (for example via 
>> pre-installing all
>>  >> providers in the webserver image if this is not possible from 
>> security
>>  >> point of view to rebuild the image dynamically)
>>  >>
>>  >> I wonder what people (and especially the people from MWAA, 
>> Composer team)
>>  >> think about it - do I get it right about the security concerns? 
>> Any other
>>  >> comments?
>>  >>
>>  >>
>>  >> J.
>>  >>
>>  >> --
>>  >> +48 660 796 129
>>  >>
>>  >
>>  >
>>  >--
>>  >Eugene
>> 


Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Subash Canapathy <su...@gmail.com>.
Hi Jarek

Thank you for surfacing this issue on a discussion. The major hurdle for managed services apart from the security constraints is on the licensing side. Previously when the code needed for connection templates was part of Airflow, we were able to bundle them as a solution as the code was under the Apache v2 license. Now that we have them separated out as provider packages, those come with dependencies that do not have "blessed" licenses that allow bundling them into managed service. I am sure GCP folks have similar restrictions on why they cannot add all 60+ providers as is into the base image.

We recently did the manual exercise to assess each of those provider package and their dependencies, and only 20 of them made the cut for not having to use additional licenses like Facebook license, LGPL etc.

Thanks
Subash Canapathy

On 2021/06/14 16:28:46, Ash Berlin-Taylor <as...@apache.org> wrote: 
> Can you elaborate (privately if you have to) on what the security concerns are? Since as I understand it the web server is powery deployment, so anything should be limited to one customer/user/deployment.
> 
> There is also the new "test connection" feature that will need the provider code installed to work.
> 
> Then there's the issue of third party connections - of which there is only going to be more of over time.
> 
> -ash
> 
> On 14 June 2021 16:35:42 BST, Eugen Kosteev <eu...@kosteev.com> wrote:
> >Hi Jarek.
> >
> >Thanks for the discussion.
> >The issue with Connections management in the web server that you described
> >is indeed affected Cloud Composer in the released preview image versions of
> >Airflow 2.0.1 (link to public issue
> >https://issuetracker.google.com/issues/190189297). And as you stated, we do
> >not install pypi packages in web server image mostly because of security
> >concerns.
> >
> >As a temporary workaround we baked all connections (list of them with their
> >widgets pickled and stored inside) into a web server image, so that
> >customers can add/edit them (even though not all providers packages are
> >pre-installed). This is a temporary workaround that we came up with for now
> >and we are looking for a long-term solution.
> >
> >Our thoughts/ideas for alternative solutions:
> >1. We do not want to pre-install all providers packages as to not generate
> >unnecessary python dependencies. Or maybe we could do this only for web
> >server images (not scheduler/worker) but then it is not clear if this is a
> >good idea to have such occured discrepancy between pypi dependencies in web
> >server vs scheduler/worker images.
> >2. Downloading and backing in providers packages (wheel files) into docker
> >image and installing customer specific/required version on demand looks
> >infeasible, taking into account number of providers, their versions and
> >their dependencies.
> >
> >- Eugene
> >
> >On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >> Dear Airflow community,
> >>
> >> Here is another result of discussions. I would like to raise an attention
> >> to potential Connection management problems that might affect managed
> >> services for Airflow 2.0 and some providers.
> >>
> >> With Airflow 2.0, connection UI "customisations" are baked into the
> >> provider package and in order to see - for example Postgres connection in
> >> the UI, you need to have the "postgres" provider installed in the Webserver.
> >>
> >> As far as I know some of the Managed Airflow services (MWAA, Composer,
> >> possibly other) do not currently allow their users installation of
> >> additional packages in the webserver (the webserver container is different
> >> than the scheduler/worker). This makes it impossible to configure/edit
> >> provider connections via UI (unless those providers are pre-installed in
> >> the webserver image).
> >>
> >> While this is understandable from security point of view to forbid "any''
> >> package installation, I think the official
> >> "apache-airlfow-providers-*" should be allowlisted for those images and
> >> installed or otherwise made available (for example via pre-installing all
> >> providers in the webserver image if this is not possible from security
> >> point of view to rebuild the image dynamically)
> >>
> >> I wonder what people (and especially the people from MWAA, Composer team)
> >> think about it - do I get it right about the security concerns? Any other
> >> comments?
> >>
> >>
> >> J.
> >>
> >> --
> >> +48 660 796 129
> >>
> >
> >
> >-- 
> >Eugene
> 

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
Can you elaborate (privately if you have to) on what the security concerns are? Since as I understand it the web server is powery deployment, so anything should be limited to one customer/user/deployment.

There is also the new "test connection" feature that will need the provider code installed to work.

Then there's the issue of third party connections - of which there is only going to be more of over time.

-ash

On 14 June 2021 16:35:42 BST, Eugen Kosteev <eu...@kosteev.com> wrote:
>Hi Jarek.
>
>Thanks for the discussion.
>The issue with Connections management in the web server that you described
>is indeed affected Cloud Composer in the released preview image versions of
>Airflow 2.0.1 (link to public issue
>https://issuetracker.google.com/issues/190189297). And as you stated, we do
>not install pypi packages in web server image mostly because of security
>concerns.
>
>As a temporary workaround we baked all connections (list of them with their
>widgets pickled and stored inside) into a web server image, so that
>customers can add/edit them (even though not all providers packages are
>pre-installed). This is a temporary workaround that we came up with for now
>and we are looking for a long-term solution.
>
>Our thoughts/ideas for alternative solutions:
>1. We do not want to pre-install all providers packages as to not generate
>unnecessary python dependencies. Or maybe we could do this only for web
>server images (not scheduler/worker) but then it is not clear if this is a
>good idea to have such occured discrepancy between pypi dependencies in web
>server vs scheduler/worker images.
>2. Downloading and backing in providers packages (wheel files) into docker
>image and installing customer specific/required version on demand looks
>infeasible, taking into account number of providers, their versions and
>their dependencies.
>
>- Eugene
>
>On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Dear Airflow community,
>>
>> Here is another result of discussions. I would like to raise an attention
>> to potential Connection management problems that might affect managed
>> services for Airflow 2.0 and some providers.
>>
>> With Airflow 2.0, connection UI "customisations" are baked into the
>> provider package and in order to see - for example Postgres connection in
>> the UI, you need to have the "postgres" provider installed in the Webserver.
>>
>> As far as I know some of the Managed Airflow services (MWAA, Composer,
>> possibly other) do not currently allow their users installation of
>> additional packages in the webserver (the webserver container is different
>> than the scheduler/worker). This makes it impossible to configure/edit
>> provider connections via UI (unless those providers are pre-installed in
>> the webserver image).
>>
>> While this is understandable from security point of view to forbid "any''
>> package installation, I think the official
>> "apache-airlfow-providers-*" should be allowlisted for those images and
>> installed or otherwise made available (for example via pre-installing all
>> providers in the webserver image if this is not possible from security
>> point of view to rebuild the image dynamically)
>>
>> I wonder what people (and especially the people from MWAA, Composer team)
>> think about it - do I get it right about the security concerns? Any other
>> comments?
>>
>>
>> J.
>>
>> --
>> +48 660 796 129
>>
>
>
>-- 
>Eugene

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Daniel Standish <dp...@gmail.com>.
Is at all feasible to deprecate connection UI customization?  Then
everything can just use `extra` json where the other params fall short.
Seems like an area where the benefit does not outweigh the complexity.  We
could also take the opportunity to deprecate the long `extra` key names
like `extra__google_cloud_platform__keyfile_dict` in favor of simpler ones
e.g. `keyfile_dict`.

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Daniel Standish <dp...@gmail.com>.
Is at all feasible to deprecate connection UI customization?  Then
everything can just use `extra` json where the other params fall short.
Seems like an area where the benefit does not outweigh the complexity.  We
could also take the opportunity to deprecate the long `extra` key names
like `extra__google_cloud_platform__keyfile_dict` in favor of simpler ones
e.g. `keyfile_dict`.

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Ash Berlin-Taylor <as...@apache.org>.
Can you elaborate (privately if you have to) on what the security concerns are? Since as I understand it the web server is powery deployment, so anything should be limited to one customer/user/deployment.

There is also the new "test connection" feature that will need the provider code installed to work.

Then there's the issue of third party connections - of which there is only going to be more of over time.

-ash

On 14 June 2021 16:35:42 BST, Eugen Kosteev <eu...@kosteev.com> wrote:
>Hi Jarek.
>
>Thanks for the discussion.
>The issue with Connections management in the web server that you described
>is indeed affected Cloud Composer in the released preview image versions of
>Airflow 2.0.1 (link to public issue
>https://issuetracker.google.com/issues/190189297). And as you stated, we do
>not install pypi packages in web server image mostly because of security
>concerns.
>
>As a temporary workaround we baked all connections (list of them with their
>widgets pickled and stored inside) into a web server image, so that
>customers can add/edit them (even though not all providers packages are
>pre-installed). This is a temporary workaround that we came up with for now
>and we are looking for a long-term solution.
>
>Our thoughts/ideas for alternative solutions:
>1. We do not want to pre-install all providers packages as to not generate
>unnecessary python dependencies. Or maybe we could do this only for web
>server images (not scheduler/worker) but then it is not clear if this is a
>good idea to have such occured discrepancy between pypi dependencies in web
>server vs scheduler/worker images.
>2. Downloading and backing in providers packages (wheel files) into docker
>image and installing customer specific/required version on demand looks
>infeasible, taking into account number of providers, their versions and
>their dependencies.
>
>- Eugene
>
>On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Dear Airflow community,
>>
>> Here is another result of discussions. I would like to raise an attention
>> to potential Connection management problems that might affect managed
>> services for Airflow 2.0 and some providers.
>>
>> With Airflow 2.0, connection UI "customisations" are baked into the
>> provider package and in order to see - for example Postgres connection in
>> the UI, you need to have the "postgres" provider installed in the Webserver.
>>
>> As far as I know some of the Managed Airflow services (MWAA, Composer,
>> possibly other) do not currently allow their users installation of
>> additional packages in the webserver (the webserver container is different
>> than the scheduler/worker). This makes it impossible to configure/edit
>> provider connections via UI (unless those providers are pre-installed in
>> the webserver image).
>>
>> While this is understandable from security point of view to forbid "any''
>> package installation, I think the official
>> "apache-airlfow-providers-*" should be allowlisted for those images and
>> installed or otherwise made available (for example via pre-installing all
>> providers in the webserver image if this is not possible from security
>> point of view to rebuild the image dynamically)
>>
>> I wonder what people (and especially the people from MWAA, Composer team)
>> think about it - do I get it right about the security concerns? Any other
>> comments?
>>
>>
>> J.
>>
>> --
>> +48 660 796 129
>>
>
>
>-- 
>Eugene

Re: [DISCUSS] Managing provider Connections via UI in managed Airflow services

Posted by Eugen Kosteev <eu...@kosteev.com>.
Hi Jarek.

Thanks for the discussion.
The issue with Connections management in the web server that you described
is indeed affected Cloud Composer in the released preview image versions of
Airflow 2.0.1 (link to public issue
https://issuetracker.google.com/issues/190189297). And as you stated, we do
not install pypi packages in web server image mostly because of security
concerns.

As a temporary workaround we baked all connections (list of them with their
widgets pickled and stored inside) into a web server image, so that
customers can add/edit them (even though not all providers packages are
pre-installed). This is a temporary workaround that we came up with for now
and we are looking for a long-term solution.

Our thoughts/ideas for alternative solutions:
1. We do not want to pre-install all providers packages as to not generate
unnecessary python dependencies. Or maybe we could do this only for web
server images (not scheduler/worker) but then it is not clear if this is a
good idea to have such occured discrepancy between pypi dependencies in web
server vs scheduler/worker images.
2. Downloading and backing in providers packages (wheel files) into docker
image and installing customer specific/required version on demand looks
infeasible, taking into account number of providers, their versions and
their dependencies.

- Eugene

On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Dear Airflow community,
>
> Here is another result of discussions. I would like to raise an attention
> to potential Connection management problems that might affect managed
> services for Airflow 2.0 and some providers.
>
> With Airflow 2.0, connection UI "customisations" are baked into the
> provider package and in order to see - for example Postgres connection in
> the UI, you need to have the "postgres" provider installed in the Webserver.
>
> As far as I know some of the Managed Airflow services (MWAA, Composer,
> possibly other) do not currently allow their users installation of
> additional packages in the webserver (the webserver container is different
> than the scheduler/worker). This makes it impossible to configure/edit
> provider connections via UI (unless those providers are pre-installed in
> the webserver image).
>
> While this is understandable from security point of view to forbid "any''
> package installation, I think the official
> "apache-airlfow-providers-*" should be allowlisted for those images and
> installed or otherwise made available (for example via pre-installing all
> providers in the webserver image if this is not possible from security
> point of view to rebuild the image dynamically)
>
> I wonder what people (and especially the people from MWAA, Composer team)
> think about it - do I get it right about the security concerns? Any other
> comments?
>
>
> J.
>
> --
> +48 660 796 129
>


-- 
Eugene