You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jasper Kahn <ja...@google.com.INVALID> on 2018/07/06 19:05:21 UTC

Initial Design for Supporting fine-grained Connection encryption [Feedback Requested]

Hello,

In support of adding fine-grained Connection encryption (Jira Issue:
https://issues.apache.org/jira/browse/AIRFLOW-2062) I wanted to gather
feedback on a proposed design, as it affects a few different Airflow
components. A full design doc is coming next week.

The end goal is to allow per-Connection encryption (as opposed the global
fernet key) to support providing containerized tasks with independent
credentials to limit access, and to enable integration with Key Management
Systems.


At a high level, Connection objects will be augmented with 2 additional
fields: `KMS_type` and `KMS_extras`, which are modeled (somewhat) after the
existing `conn_type` and `extras` fields. Each connection can be flagged as
"independently encrypted", which then prompts the user to pick a KMS (from
a predefined list, like Connection type) and enter the relevant
authentication and metadata that KMS requires to operate (mirroring how
choosing a Connection type results in additional configuration).

The credentials to authenticate with the KMS can either be manually placed
(like some key files for Connections are now) or, in the case of
containerized workers, injected as a key file (through file system mapping)
or environment variable on a per-worker basis. These changes are primarily
in support of the second (containerized workers) model.

When creating an encrypted Connection, Airflow will generate a
cryptographic key (likely AES, possibly a separate fernet key) for that
connection and encrypt the Connection fields. It will then encrypt that key
(K_conn) using the KMS.

KMS communication happens through KMSClients, which are implemented very
similarly to Connection types and Hooks, with a mapping from KMS_type to a
particular Client. New clients can be added by the community (as with
hooks/Connection Types). The API for a KMSClient is simple: Encryption and
Decryption. The `encrypt` method would take in K_conn and the configuration
data, encrypt K_conn through the KMS, and return JSON to be stored in the
KMS_extra field. `decrypt` is passed this KMS_extra JSON, decrypts K_conn
though the KMS, and returns K_conn to be used to decrypt the Connection
data. After both operations, K_conn is purged from memory.

Decryption would be implemented where the Connection is loaded from the
database or environment. This makes the presence of per-Connection
encryption transparent to any calling code, much like the fernet encryption
works now.


As mentioned, all feedback and criticism is welcome to try to improve this
design. Thanks!

Jasper Kahn

Re: Initial Design for Supporting fine-grained Connection encryption [Feedback Requested]

Posted by Ash Berlin-Taylor <as...@firemirror.com>.
A KMS is nothing strictly to do with Kubernetes - both Google Cloud and AWS have a Key Management System. Having extra fields in the JSON relies on extra being JSON which it might not be in every case.

Comments in-line Jasper.

The end goal of only allowing certain tasks to decrypt connections is a good one, and sounds useful for a multi-team/multi-tenant Airflow. Even in the case of not using containerised workers having KMS encryption could make things a bit better (although provides no further security over the current as any Airflow process would be able to impersonate another DAG worker and get other secrets)


> On 7 Jul 2018, at 08:11, Shah Altaf <me...@gmail.com> wrote:
> 
> Hi my feedback is - This feels far too k8s specific.  There would now be
> extra fields in the connection form (and CLI) that are
> hosting/implementation specific and not at all agnostic.  These could
> probably go as additional params in the existing extra field's JSON.  That
> would avoid any confusion about what those extra fields are and any
> k8s-users that want that specific implementation have a place to put it.
> 
> 
> 
> 
> On Fri, Jul 6, 2018 at 9:02 PM Jasper Kahn <ja...@google.com.invalid>
> wrote:
> 
>> Hello,
>> 
>> In support of adding fine-grained Connection encryption (Jira Issue:
>> https://issues.apache.org/jira/browse/AIRFLOW-2062) I wanted to gather
>> feedback on a proposed design, as it affects a few different Airflow
>> components. A full design doc is coming next week.
>> 
>> The end goal is to allow per-Connection encryption (as opposed the global
>> fernet key) to support providing containerized tasks with independent
>> credentials to limit access, and to enable integration with Key Management
>> Systems.
>> 
>> 
>> At a high level, Connection objects will be augmented with 2 additional
>> fields: `KMS_type` and `KMS_extras`, which are modeled (somewhat) after the
>> existing `conn_type` and `extras` fields. Each connection can be flagged as
>> "independently encrypted", which then prompts the user to pick a KMS (from
>> a predefined list, like Connection type) and enter the relevant
>> authentication and metadata that KMS requires to operate (mirroring how
>> choosing a Connection type results in additional configuration).

You don't mention where the KMS's will be defined for the user to pick from? Will this be a connection itself?

Do we need totally new fields? Would a single extra flag/type field be enough? In the case where kms_conn_id is set what would be in the current extra_field? Although unlikely might we want support for using two different KMS ids concurrently? (therefore a `kms_type` isn't unique enough).

Perhaps example values of the case where a connection is KMS encrypted, and one where it isn't would help make this clearer.

>> 
>> The credentials to authenticate with the KMS can either be manually placed
>> (like some key files for Connections are now) or, in the case of
>> containerized workers, injected as a key file (through file system mapping)
>> or environment variable on a per-worker basis. These changes are primarily
>> in support of the second (containerized workers) model.
>> 
>> When creating an encrypted Connection, Airflow will generate a
>> cryptographic key (likely AES, possibly a separate fernet key) for that
>> connection and encrypt the Connection fields. It will then encrypt that key
>> (K_conn) using the KMS.

With the AWS KMS I think it's possible to let it entirely this process - and you just give it the data to de/encrypt. Where it does handing off this entire process should be achieved - the less crypto code we have to do ourselves the better!

>> 
>> KMS communication happens through KMSClients, which are implemented very
>> similarly to Connection types and Hooks, with a mapping from KMS_type to a
>> particular Client. New clients can be added by the community (as with
>> hooks/Connection Types). The API for a KMSClient is simple: Encryption and
>> Decryption. The `encrypt` method would take in K_conn and the configuration
>> data, encrypt K_conn through the KMS, and return JSON to be stored in the
>> KMS_extra field. `decrypt` is passed this KMS_extra JSON, decrypts K_conn
>> though the KMS, and returns K_conn to be used to decrypt the Connection
>> data. After both operations, K_conn is purged from memory.

"configuration data" is encryption/decryption context that would be passed to the AWS KMS APIs? (for example. I use the AWS one as that is the one I'm most familiar with)

>> 
>> Decryption would be implemented where the Connection is loaded from the
>> database or environment. This makes the presence of per-Connection
>> encryption transparent to any calling code, much like the fernet encryption
>> works now.
>> 
>> 
>> As mentioned, all feedback and criticism is welcome to try to improve this
>> design. Thanks!
>> 
>> Jasper Kahn
>> 


Re: Initial Design for Supporting fine-grained Connection encryption [Feedback Requested]

Posted by Shah Altaf <me...@gmail.com>.
Hi my feedback is - This feels far too k8s specific.  There would now be
extra fields in the connection form (and CLI) that are
hosting/implementation specific and not at all agnostic.  These could
probably go as additional params in the existing extra field's JSON.  That
would avoid any confusion about what those extra fields are and any
k8s-users that want that specific implementation have a place to put it.




On Fri, Jul 6, 2018 at 9:02 PM Jasper Kahn <ja...@google.com.invalid>
wrote:

> Hello,
>
> In support of adding fine-grained Connection encryption (Jira Issue:
> https://issues.apache.org/jira/browse/AIRFLOW-2062) I wanted to gather
> feedback on a proposed design, as it affects a few different Airflow
> components. A full design doc is coming next week.
>
> The end goal is to allow per-Connection encryption (as opposed the global
> fernet key) to support providing containerized tasks with independent
> credentials to limit access, and to enable integration with Key Management
> Systems.
>
>
> At a high level, Connection objects will be augmented with 2 additional
> fields: `KMS_type` and `KMS_extras`, which are modeled (somewhat) after the
> existing `conn_type` and `extras` fields. Each connection can be flagged as
> "independently encrypted", which then prompts the user to pick a KMS (from
> a predefined list, like Connection type) and enter the relevant
> authentication and metadata that KMS requires to operate (mirroring how
> choosing a Connection type results in additional configuration).
>
> The credentials to authenticate with the KMS can either be manually placed
> (like some key files for Connections are now) or, in the case of
> containerized workers, injected as a key file (through file system mapping)
> or environment variable on a per-worker basis. These changes are primarily
> in support of the second (containerized workers) model.
>
> When creating an encrypted Connection, Airflow will generate a
> cryptographic key (likely AES, possibly a separate fernet key) for that
> connection and encrypt the Connection fields. It will then encrypt that key
> (K_conn) using the KMS.
>
> KMS communication happens through KMSClients, which are implemented very
> similarly to Connection types and Hooks, with a mapping from KMS_type to a
> particular Client. New clients can be added by the community (as with
> hooks/Connection Types). The API for a KMSClient is simple: Encryption and
> Decryption. The `encrypt` method would take in K_conn and the configuration
> data, encrypt K_conn through the KMS, and return JSON to be stored in the
> KMS_extra field. `decrypt` is passed this KMS_extra JSON, decrypts K_conn
> though the KMS, and returns K_conn to be used to decrypt the Connection
> data. After both operations, K_conn is purged from memory.
>
> Decryption would be implemented where the Connection is loaded from the
> database or environment. This makes the presence of per-Connection
> encryption transparent to any calling code, much like the fernet encryption
> works now.
>
>
> As mentioned, all feedback and criticism is welcome to try to improve this
> design. Thanks!
>
> Jasper Kahn
>