You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/27 15:16:56 UTC

[GitHub] [airflow] dstandish opened a new pull request #19857: Enable json serialization for secrets backend

dstandish opened a new pull request #19857:
URL: https://github.com/apache/airflow/pull/19857


   Previously in general we could only store connections in the Airflow URI format.  With this change we can serialize as json.  The Airflow URI format can be very tricky to work with and although we have for some time had a convenience method Connection.get_uri, using json is just simpler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1057055263


   > I'm working on it. But it's ultimately gonna require a pretty substantial refactor of the 'managing connections' documentation. I'd kindof prefer to add that in a separate PR if you're OK with that.
   
   Well actually a clarification, I did add minimal docs changes but wanted to save the managing conns refactor for another PR but yes i'll continue to try to work on that while this remains open.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810225522



##########
File path: airflow/cli/commands/connection_command.py
##########
@@ -82,25 +83,35 @@ def connections_list(args):
         )
 
 
+def _connection_to_dict(conn: Connection) -> dict:
+    return dict(
+        conn_type=conn.conn_type,
+        description=conn.description,
+        login=conn.login,
+        password=conn.password,
+        host=conn.host,
+        port=conn.port,
+        schema=conn.schema,
+        extra=conn.extra,

Review comment:
       it doesn't matter what is stored in `extra` because it is a string and when we serialize it we don't make any assumptions about what it contains.
   
   when we deserialize it, it is again a string and we pass it to Connection as a kwarg.
   
   So yes suppose we have this connection
   
   ```
   Connection(extra=json.dumps({'hi': 'bye'}))
   ```
   So the CLI exporter will export this like so
   ```
   '{"extra": "{\\"hi\\": \\"bye\\"}"}'
   ```
   So when we try to parse this again we do json.loads, and now we get a dict where `extra` is a plain json string, as desired:
   ```
   {'extra': '{"hi": "bye"}'}
   ```
   So, everything is cool right?
   There's one more detail here though.
   What we _do also_ support here, is serializing airflow conn extra fully as object.
   By that I mean, storing _this_ uri _is permissible_:
   ```
   '{"extra": {"hi": "bye"}}'
   ```
   And the reason we can support this is, if it's stored like this, `extra` will be `dict` when the value is loaded.
   We just can't _serialize_ in this way by default because there is no guarantee that `extra` will be json.
   
   So in sum, we support `[obj, str] > Connection.extra`, but when _generating_ the value in CLI export, we have to only support `Connection.extra -> str`.
   
   I _wish_ this were not the case, but because we don't require extra to be json (maybe until 3.0?) i think that's how it must be.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810320940



##########
File path: airflow/secrets/base_secrets.py
##########
@@ -40,10 +47,46 @@ def build_path(path_prefix: str, secret_id: str, sep: str = "/") -> str:
         """
         return f"{path_prefix}{sep}{secret_id}"
 
+    def get_conn_value(self, conn_id: str) -> Optional[str]:
+        """
+        Retrieve from Secrets Backend a string value representing the Connection object.
+
+        If the client your secrets backend uses already returns a python dict, you should override
+        ``get_connection`` instead.
+
+        :param conn_id: connection id
+        """
+        raise NotImplementedError
+
+    def _deserialize_connection(self, conn_id, value):

Review comment:
       sure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1068503832


   thanks @potiuk happy to get this one over the finish line


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1069480919


   ok thanks @eladkal -- i can take a look at this.
   
   can you give me any examples of where you are seeing this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follows a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is a value, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view.
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who bothers, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _CMD"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user, host, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but credential is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put (`/connections/my_secret_password` is the "key" to use when querying Secret Backend):
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1040528353


   > Is it like .. ready for review @dstandish :) ?
   
   yes i believe so 😅 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish merged pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish merged pull request #19857:
URL: https://github.com/apache/airflow/pull/19857


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810249514



##########
File path: airflow/secrets/environment_variables.py
##########
@@ -30,8 +31,21 @@ class EnvironmentVariablesBackend(BaseSecretsBackend):
     """Retrieves Connection object and Variable from environment variable."""
 
     def get_conn_uri(self, conn_id: str) -> Optional[str]:
-        environment_uri = os.environ.get(CONN_ENV_PREFIX + conn_id.upper())
-        return environment_uri
+        """
+        Return URI representation of Connection conn_id
+        :param conn_id: the connection id
+        :return: deserialized Connection
+        """
+        warnings.warn(

Review comment:
       if `get_connection` is doing the calling, it will find `get_conn_value` and it will not call this method.  so there should be no double warning.
   
   but we can't just remove this method right?  we have to deprecate it.  so i think this warning needs to stay no?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810229374



##########
File path: airflow/secrets/base_secrets.py
##########
@@ -26,8 +26,15 @@
 class BaseSecretsBackend(ABC):
     """Abstract base class to retrieve Connection object given a conn_id or Variable given a key"""
 
-    def __init__(self, **kwargs):
-        pass
+    def __init__(self, connection_serialization_format: Optional[str] = None, **kwargs):
+        from airflow.configuration import conf  # must be local to avoid circular import

Review comment:
       yes, the problem is that secrets can now be used to override a few config items.  so what do you do when secrets backend needs config!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follow a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is important, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) from  Airflow side to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "credentials" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who thinks this is wrong and won't convince others, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check it with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follows a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is a value, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who bothers, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follow a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is a value, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who thinks this is wrong and won't convince others, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check it with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1068373987


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980718633


   @potiuk  You don't necessary need to write a custom secrets backend in order to have secrets rotation.  E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764).  and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   that example:
   
   > And the tools that the organisation has treats those two separately
   
   is that really all that common?
   
   i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation.  i think it might be best to leave it to the external tools.
   
   and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow.  e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   --- 
   
   separately, what do you think of the general idea of this PR though?  should i proceed with it you think?  i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follows a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is a value, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it does not limit to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who bothers, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980718633


   @potiuk  You don't necessary need to write a custom secrets backend in order to have secrets rotation.  E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764).  and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   that example:
   
   > And the tools that the organisation has treats those two separately
   
   is that really all that common?
   
   i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation.  i think it might be best to leave it to the external tools.
   
   and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow.  e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   --- 
   
   separately, what do you think of the general idea of this PR though?  should i proceed with it you think?  i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   Already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize the approach.
   
   I considered possibly using a config key in `airflow.cfg` to flip the default expectation from URI to json. And we could do this and put it in `[secrets]`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1069533737


   @dstandish example:
   
   ```
   from datetime import datetime
   from airflow import DAG
   from airflow.operators.python import PythonOperator
   from airflow.providers.amazon.aws.hooks.s3 import S3Hook
   
   
   default_args = {
       'start_date': datetime(2022, 3, 15),
       'owner': 'airflow',
       'retries': 1
   }
   
   
   def test_hook():
       s3_hook = S3Hook()
       s3_hook.load_file(filename='bla', key='blaa', bucket_name='fff')
   
   
   dag = DAG(dag_id='my_dag', default_args=default_args)
   
   py_test = PythonOperator(task_id='python_task', python_callable=test_hook, dag=dag)
   
   ```
   
   It produce the warning (some other unrelated warnings as well):
   ```
   [2022-03-16, 19:26:09 UTC] {logging_mixin.py:115} WARNING - /opt/***/***/secrets/base_secrets.py:95 PendingDeprecationWarning: This method is deprecated. Please use `***.secrets.environment_variables.EnvironmentVariablesBackend.get_conn_value`.
   [2022-03-16, 19:26:09 UTC] {logging_mixin.py:115} WARNING - /opt/***/***/models/connection.py:420 PendingDeprecationWarning: Method `get_conn_uri` is deprecated. Please use `get_conn_value`.
   [2022-03-16, 19:26:11 UTC] {logging_mixin.py:115} WARNING - /opt/***/***/models/dag.py:1084 SADeprecationWarning: Query.value() is deprecated and will be removed in a future release.  Please use Query.with_entities() in combination with Query.scalar() (deprecated since: 1.4)
   
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1057055263


   > I'm working on it. But it's ultimately gonna require a pretty substantial refactor of the 'managing connections' documentation. I'd kindof prefer to add that in a separate PR if you're OK with that.
   
   Well actually a clarification, I did add some info to docs in security / secrets backend section. But wanted to save the managing conns refactor for another PR but yes i'll continue to try to work on that while this remains open.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1069542431


   got it @eladkal 
   i think i see the issue, and i'll work on a fix today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _CMD"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user, host, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but credential is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put:
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980718633


   @potiuk  You don't necessary need to write a custom secrets backend in order to have secrets rotation.  E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764).  and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   that example:
   
   > And the tools that the organisation has treats those two separately
   
   is that really all that common?
   
   i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation.  i think it might be best to leave it to the external tools.
   
   and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow.  e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   --- 
   
   separately, what do you think of the general idea of this PR though?  should i proceed with it you think?  i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follows a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is a value, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who thinks this is wrong and won't convince others, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810003012



##########
File path: airflow/models/connection.py
##########
@@ -385,3 +389,17 @@ def get_connection_from_secrets(cls, conn_id: str) -> 'Connection':
                 )
 
         raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined")
+
+    @classmethod
+    def from_json(cls, value, conn_id=None) -> 'Connection':
+        kwargs = json.loads(value)
+        extra = kwargs.pop('extra', None)
+        if extra:
+            kwargs['extra'] = extra if isinstance(extra, str) else json.dumps(extra)
+        conn_type = kwargs.pop('conn_type', None)
+        if conn_type:
+            kwargs['conn_type'] = cls._normalize_conn_type(conn_type)
+        port = kwargs.pop('port', None)
+        if port:
+            kwargs['port'] = int(port)

Review comment:
       I think we should explicity catch ValueError here and provide a bit more meaningful message. Otherwise users will have dig in the code what really caused this mysterious ValueError raised in from_json() function

##########
File path: airflow/secrets/environment_variables.py
##########
@@ -30,8 +31,21 @@ class EnvironmentVariablesBackend(BaseSecretsBackend):
     """Retrieves Connection object and Variable from environment variable."""
 
     def get_conn_uri(self, conn_id: str) -> Optional[str]:
-        environment_uri = os.environ.get(CONN_ENV_PREFIX + conn_id.upper())
-        return environment_uri
+        """
+        Return URI representation of Connection conn_id
+        :param conn_id: the connection id
+        :return: deserialized Connection
+        """
+        warnings.warn(

Review comment:
       I tihnk we do not have to have the second warning here. It is already raised in "get_connection" when get_conn_uri is executed as fallback..  

##########
File path: tests/cli/commands/test_connection_command.py
##########
@@ -308,35 +308,32 @@ def test_cli_connections_export_should_export_as_json(self, mock_file_open, mock
         )
         connection_command.connections_export(args)
 
-        expected_connections = json.dumps(
-            {
-                "airflow_db": {
-                    "conn_type": "mysql",
-                    "description": "mysql conn description",
-                    "host": "mysql",
-                    "login": "root",
-                    "password": "plainpassword",
-                    "schema": "airflow",
-                    "port": None,
-                    "extra": None,
-                },
-                "druid_broker_default": {
-                    "conn_type": "druid",
-                    "description": "druid-broker conn description",
-                    "host": "druid-broker",
-                    "login": None,
-                    "password": None,
-                    "schema": None,
-                    "port": 8082,
-                    "extra": "{\"endpoint\": \"druid/v2/sql\"}",
-                },
+        expected_connections = {
+            "airflow_db": {
+                "conn_type": "mysql",
+                "description": "mysql conn description",
+                "host": "mysql",
+                "login": "root",
+                "password": "plainpassword",
+                "schema": "airflow",
+                "port": None,
+                "extra": None,
             },
-            indent=2,
-        )
+            "druid_broker_default": {
+                "conn_type": "druid",
+                "description": "druid-broker conn description",
+                "host": "druid-broker",
+                "login": None,
+                "password": None,
+                "schema": None,
+                "port": 8082,
+                "extra": "{\"endpoint\": \"druid/v2/sql\"}",
+            },
+        }
 
         mock_splittext.assert_called_once()
         mock_file_open.assert_called_once_with(output_filepath, 'w', -1, 'UTF-8', None)
-        mock_file_open.return_value.write.assert_called_once_with(expected_connections)
+        assert json.loads(mock_file_open.return_value.write.call_args[0][0]) == expected_connections

Review comment:
       Nice :). UX is also important for developers.

##########
File path: airflow/secrets/base_secrets.py
##########
@@ -40,10 +47,46 @@ def build_path(path_prefix: str, secret_id: str, sep: str = "/") -> str:
         """
         return f"{path_prefix}{sep}{secret_id}"
 
+    def get_conn_value(self, conn_id: str) -> Optional[str]:
+        """
+        Retrieve from Secrets Backend a string value representing the Connection object.
+
+        If the client your secrets backend uses already returns a python dict, you should override
+        ``get_connection`` instead.
+
+        :param conn_id: connection id
+        """
+        raise NotImplementedError
+
+    def _deserialize_connection(self, conn_id, value):

Review comment:
       I think we should we make it `deserialize_connection` and explicity document it as overrideable by Secret Backend implementation. And add description on that to the Secret Backends. This is a great way to handle custom serialization methods if needed. 

##########
File path: airflow/secrets/base_secrets.py
##########
@@ -26,8 +26,15 @@
 class BaseSecretsBackend(ABC):
     """Abstract base class to retrieve Connection object given a conn_id or Variable given a key"""
 
-    def __init__(self, **kwargs):
-        pass
+    def __init__(self, connection_serialization_format: Optional[str] = None, **kwargs):
+        from airflow.configuration import conf  # must be local to avoid circular import

Review comment:
       Oh yeah. I wish we spit conf and avoid those.

##########
File path: airflow/secrets/base_secrets.py
##########
@@ -52,15 +95,29 @@ def get_connection(self, conn_id: str) -> Optional['Connection']:
         """
         Return connection object with a given ``conn_id``.
 
+        Tries ``get_conn_value`` first and if not implemented, tries ``get_conn_uri``
+
         :param conn_id: connection id
         """
-        from airflow.models.connection import Connection
-
-        conn_uri = self.get_conn_uri(conn_id=conn_id)
-        if not conn_uri:
+        value = None
+
+        # TODO: after removal of ``get_conn_uri`` we should not catch NotImplementedError here
+        with suppress(NotImplementedError):
+            value = self.get_conn_value(conn_id=conn_id)

Review comment:
       Nice way of handling back-compatibility!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810320373



##########
File path: airflow/secrets/base_secrets.py
##########
@@ -26,8 +26,15 @@
 class BaseSecretsBackend(ABC):
     """Abstract base class to retrieve Connection object given a conn_id or Variable given a key"""
 
-    def __init__(self, **kwargs):
-        pass
+    def __init__(self, connection_serialization_format: Optional[str] = None, **kwargs):
+        from airflow.configuration import conf  # must be local to avoid circular import

Review comment:
       do you think there is anything we could do about it?
   i suppose it would be possible to have a `bootstrap_configuration.py` where we parse the secrets backend config and initialize secrets backend before doing everything in configuration.py.  wdyt?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810233976



##########
File path: airflow/cli/commands/connection_command.py
##########
@@ -82,25 +83,35 @@ def connections_list(args):
         )
 
 
+def _connection_to_dict(conn: Connection) -> dict:
+    return dict(
+        conn_type=conn.conn_type,
+        description=conn.description,
+        login=conn.login,
+        password=conn.password,
+        host=conn.host,
+        port=conn.port,
+        schema=conn.schema,
+        extra=conn.extra,

Review comment:
       > We just can't serialize in this way by default because there is no guarantee that extra will be json
   
   Yeah, that's sort of what I was getting it.
   
   I think we should work out a deprecation plan for storing anything but json in extra.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810349152



##########
File path: airflow/models/connection.py
##########
@@ -385,3 +389,17 @@ def get_connection_from_secrets(cls, conn_id: str) -> 'Connection':
                 )
 
         raise AirflowNotFoundException(f"The conn_id `{conn_id}` isn't defined")
+
+    @classmethod
+    def from_json(cls, value, conn_id=None) -> 'Connection':
+        kwargs = json.loads(value)
+        extra = kwargs.pop('extra', None)
+        if extra:
+            kwargs['extra'] = extra if isinstance(extra, str) else json.dumps(extra)
+        conn_type = kwargs.pop('conn_type', None)
+        if conn_type:
+            kwargs['conn_type'] = cls._normalize_conn_type(conn_type)
+        port = kwargs.pop('port', None)
+        if port:
+            kwargs['port'] = int(port)

Review comment:
       doen




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follows a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is a value, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who thinks this is wrong and won't convince others, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check it with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _CMD"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user, host, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but credential is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options to allow to keep connection "metadata" in our metadata DB and kee[ "secret" credentials in Secrets. That seems to use both types of storages in the way they were designed for.
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put (`/connections/my_secret_password` is the "key" to use when querying Secret Backend):
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve from the metadata DB for most of the connection and combine it with credential(s) retrieved from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets). We already ALWAYS have both - metadata available and possible to use when secret is enabled, so operationally it's not a big change.
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-982832152


   ok @potiuk if you're interested would welcome your feedback on the approach taken here w.r.t. adding json support for any backend that serializes to a string value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-981061859


   Sure - don't get me wrong. I do see the incrememental improvement of using json, and I am not at all against it, I just think that it does not solve "real" (and different) problem of some organisations which might want to keep secrets in their own structure rather than following Airflow ways. It was not my intention to block it or contest it - it was more to raise attention to the fact that it might not be solving a more 'painful" issue for our users. Which is perfectly fine - it does not have to. It solves other issues and I agree it is an improvement over the current URI form.
   
   However I do think there is not a lot we should do now about the bigger (IMHO)  issue (unless someone wants to take on the task on trying to generalize the approach that is implemented in AWS secret backend).  And It's good we had this discussion because I had a chance to write down my thoughts about it and realise that what we really need to do is to add a little chapter to simply describe the status-quo and guide the users who have this kind of problem to roll their own backends (which I proposed in #19859)
   
   For now this is I think perfectly fine to keep it this way (and I'd love to hear other's comments on the "json format" as well).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r820063884



##########
File path: docs/apache-airflow/usage-cli.rst
##########
@@ -259,3 +259,91 @@ If you want to preview the commands but not execute them, use option ``--sql-onl
 Options ``--from-revision`` and ``--from-version`` may only be used in conjunction with the ``--sql-only`` option, because when actually *running* migrations we should always downgrade from current revision.
 
 For a mapping between Airflow version and Alembic revision see :doc:`/migrations-ref`.
+
+
+.. _cli-export-connections:
+
+Exporting Connections

Review comment:
       this is relocated from managing connections doc (and linked to from there)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1056452743


   @potiuk 
   
   > Also we need some updates to documentation with that one - describing it.
   
   I'm working on it. But it's ultimately gonna require a pretty substantial refactor of the 'managing connections' documentation.  I'd kindof prefer to add that in a separate PR if you're OK with that.
   
   Separately, one thing I considered is, instead of making "default connection serialization format" a config parameter, we could use a "try / except" approach -- if the value is valid json, use it; otherwise try parsing as URI; otherwise fail.  I'm not sure what's best.  WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1069406427


   I'm starting to see warnings raised about `get_conn_uri`:
   `[2022-03-16, 18:06:38 UTC] {logging_mixin.py:115} WARNING - /opt/***/***/models/connection.py:420 PendingDeprecationWarning: Method `get_conn_uri` is deprecated. Please use `get_conn_value`.`
   
   We have several references in the code to the deprecated method we should fix the references to avoid the warrnings
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1040519027


   Is it like .. ready for review @dstandish  :) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1044546278


   Also we need some updates to documentation with that one - describing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810239959



##########
File path: airflow/cli/commands/connection_command.py
##########
@@ -82,25 +83,35 @@ def connections_list(args):
         )
 
 
+def _connection_to_dict(conn: Connection) -> dict:
+    return dict(
+        conn_type=conn.conn_type,
+        description=conn.description,
+        login=conn.login,
+        password=conn.password,
+        host=conn.host,
+        port=conn.port,
+        schema=conn.schema,
+        extra=conn.extra,

Review comment:
       > I think we should work out a deprecation plan for storing anything but json in extra.
   
   yeah i think that would be good




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810339748



##########
File path: airflow/secrets/base_secrets.py
##########
@@ -40,10 +47,46 @@ def build_path(path_prefix: str, secret_id: str, sep: str = "/") -> str:
         """
         return f"{path_prefix}{sep}{secret_id}"
 
+    def get_conn_value(self, conn_id: str) -> Optional[str]:
+        """
+        Retrieve from Secrets Backend a string value representing the Connection object.
+
+        If the client your secrets backend uses already returns a python dict, you should override
+        ``get_connection`` instead.
+
+        :param conn_id: connection id
+        """
+        raise NotImplementedError
+
+    def _deserialize_connection(self, conn_id, value):

Review comment:
       i made it "public"
   
   i took a look and the docs for secrets backend itself are pretty sparse.  i'd like to leave it that way for now.  
   
   i realize also that ultimately the "managing connections" page will need a revamp (because much of it assumes you are going to use URI).  but given that that's a larger effort, and nothing about this PR should interfere with any of that guidance (since it should be fully backward compatible), i'd like to defer that docs revamp for a followup if that sounds OK to you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810320373



##########
File path: airflow/secrets/base_secrets.py
##########
@@ -26,8 +26,15 @@
 class BaseSecretsBackend(ABC):
     """Abstract base class to retrieve Connection object given a conn_id or Variable given a key"""
 
-    def __init__(self, **kwargs):
-        pass
+    def __init__(self, connection_serialization_format: Optional[str] = None, **kwargs):
+        from airflow.configuration import conf  # must be local to avoid circular import

Review comment:
       do you think there is anything we could do about it?
   i suppose it would be possible to have a `bootstrap_configuration.py`, or `secrets_configuration.py`, where we parse the secrets backend config and initialize secrets backend before doing everything in configuration.py.  wdyt?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810225522



##########
File path: airflow/cli/commands/connection_command.py
##########
@@ -82,25 +83,35 @@ def connections_list(args):
         )
 
 
+def _connection_to_dict(conn: Connection) -> dict:
+    return dict(
+        conn_type=conn.conn_type,
+        description=conn.description,
+        login=conn.login,
+        password=conn.password,
+        host=conn.host,
+        port=conn.port,
+        schema=conn.schema,
+        extra=conn.extra,

Review comment:
       TLDR, when we deserialize we detect whether user stored object or string:
   https://github.com/apache/airflow/pull/19857/files#diff-8042973481905e77574983904cbbcdddaa15589506a3de14459152d2420f471eR397-R398
   
   more detail:
   it doesn't matter what is stored in `extra` because it is a string and when we serialize it we don't make any assumptions about what it contains.
   
   when we deserialize it, it is again a string and we pass it to Connection as a kwarg.
   
   So yes suppose we have this connection
   
   ```
   Connection(extra=json.dumps({'hi': 'bye'}))
   ```
   So the CLI exporter will export this like so
   ```
   '{"extra": "{\\"hi\\": \\"bye\\"}"}'
   ```
   So when we try to parse this again we do json.loads, and now we get a dict where `extra` is a plain json string, as desired:
   ```
   {'extra': '{"hi": "bye"}'}
   ```
   So, everything is cool right?
   There's one more detail here though.
   What we _do also_ support here, is serializing airflow conn extra fully as object.
   By that I mean, storing like this _is permissible_:
   ```
   '{"extra": {"hi": "bye"}}'
   ```
   And the reason we can support this is, if it's stored like this, `extra` will be `dict` when the value is loaded.
   We just can't _serialize_ in this way by default because there is no guarantee that `extra` will be json.
   
   So in sum, we support `[obj, str] > Connection.extra`, but when _generating_ the value in CLI export, we have to only support `Connection.extra -> str`.
   
   I _wish_ this were not the case, but because we don't require extra to be json (maybe until 3.0?) i think that's how it must be.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend.
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user,URIs, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but token is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put:
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _CMD"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user, host, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but token is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put:
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   However I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend.
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user,URIs, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but token is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put:
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _CMD"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user, host, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but credential is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put (`/connections/my_secret_password` is the "key" to use when querying Secret Backend):
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve from the metadata DB for most of the connection and combine it with credential(s) retrieved from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1040519027


   Is it like .. ready for review @dstandish  :) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810015933



##########
File path: airflow/cli/commands/connection_command.py
##########
@@ -82,25 +83,35 @@ def connections_list(args):
         )
 
 
+def _connection_to_dict(conn: Connection) -> dict:
+    return dict(
+        conn_type=conn.conn_type,
+        description=conn.description,
+        login=conn.login,
+        password=conn.password,
+        host=conn.host,
+        port=conn.port,
+        schema=conn.schema,
+        extra=conn.extra,

Review comment:
       What about dejson? Wouldn't this likely result in the extra field being a string containing a JSON dict?, i.e. `"extra": "{\"extra_key\": \"value\"}"`
   
   (I'm not sure there's a ready solution to this, as it's _possible_ the extra isn't JSON




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1040528353


   > Is it like .. ready for review @dstandish :) ?
   
   yes i believe so 😅 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _cmd"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user,URIs, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but token is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put:
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980718633


   @potiuk  You don't necessary need to write a custom secrets backend in order to have secrets rotation.  E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764).  and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   that example:
   
   > And the tools that the organisation has treats those two separately
   
   is that really all that common?
   
   i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation.  i think it might be best to leave it to the external tools.
   
   and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow.  e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   --- 
   
   separately, what do you think of the general idea of this PR though?  should i proceed with it you think?  i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follow a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is important, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) from  Airflow side to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who thinks this is wrong and won't convince others, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check it with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980830665


   >  Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar, generic-enough solution for all of them. **I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.**
   
   That is not what I'm attempting to do here.  JSON is simply a better, friendlier way to store creds than URI.  And the cat is out of the bag with allowing JSON storage of connections.  So, I think we should embrace it and allow it uniformly.
   
   What's the alternative?  Do you think we should disallow entirely the use of json as secrets serialization format?  Or perhaps allow it on a case-by-case basis?  Or, perhaps for every backend we should make a json version such e.g. EnvironmentVariablesJSONBackend and SystemManagerParameterStoreJSONBackend etc?
   
   Separately, I do recognize that having one place to store creds retrievable from many systems can be nice.  Using JSON can actually help in this regard.  I've actually done this where jupyter and airflow pulled from same creds in aws systems manager.  But that's not the problem I'm trying to solve in this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1068503832


   thanks @potiuk happy to get this one over the finish line 🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#discussion_r810225522



##########
File path: airflow/cli/commands/connection_command.py
##########
@@ -82,25 +83,35 @@ def connections_list(args):
         )
 
 
+def _connection_to_dict(conn: Connection) -> dict:
+    return dict(
+        conn_type=conn.conn_type,
+        description=conn.description,
+        login=conn.login,
+        password=conn.password,
+        host=conn.host,
+        port=conn.port,
+        schema=conn.schema,
+        extra=conn.extra,

Review comment:
       it doesn't matter what is stored in `extra` because it is a string and when we serialize it we don't make any assumptions about what it contains.
   
   when we deserialize it, it is again a string and we pass it to Connection as a kwarg.
   
   So yes suppose we have this connection
   
   ```
   Connection(extra=json.dumps({'hi': 'bye'}))
   ```
   So the CLI exporter will export this like so
   ```
   '{"extra": "{\\"hi\\": \\"bye\\"}"}'
   ```
   So when we try to parse this again we do json.loads, and now we get a dict where `extra` is a plain json string, as desired:
   ```
   {'extra': '{"hi": "bye"}'}
   ```
   So, everything is cool right?
   There's one more detail here though.
   What we _do also_ support here, is serializing airflow conn extra fully as object.
   By that I mean, storing like this _is permissible_:
   ```
   '{"extra": {"hi": "bye"}}'
   ```
   And the reason we can support this is, if it's stored like this, `extra` will be `dict` when the value is loaded.
   We just can't _serialize_ in this way by default because there is no guarantee that `extra` will be json.
   
   So in sum, we support `[obj, str] > Connection.extra`, but when _generating_ the value in CLI export, we have to only support `Connection.extra -> str`.
   
   I _wish_ this were not the case, but because we don't require extra to be json (maybe until 3.0?) i think that's how it must be.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1068193270


   > While we are at this, I think that we should mask the extra field
   
   i'd rather not add that to this one. it's big enough already and that seems very much like a separate change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1068537854


   🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1058896816


   had to refactor / simplify tests, done that in #21983, which now needs to be merged first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1058896816


   ~had to refactor / simplify tests, done that in #21983, which now needs to be merged first.~ <-- merged now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-1059715688


   OK finally have the docs in, i think, a decent place.  Really required a lot of work to try to get things where they feel reasonably coherent (talking bout the managing connections doc).
   
   I have attached the rendered doc for easier review.
   [Managing connections.zip](https://github.com/apache/airflow/files/8190261/Managing.connections.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980644388


   Ah nice. Indeed Airflow connection URI IS trricky. 
   
   Comment while you are at it @dstandish, as this seems very much related
   
   I think this does not solve a specific case that several of our users mentioned. And I am just thinking whether we maybe also try to solve it systemically rather than telling the users "Write your custom Secrets Backend" of "use _CMD"
   
   There is a case where the users already have processes and tools to automatically rotate their credentials, but only when they are standalone "values" - not part of URI, not part of dictionary. The credentials (passwords/tokens etc). are the only things that change when they are rotated - all the rest - user,URIs, extras remain as they were. That for me is really, really valid use case - where part of the connection (the non-secret one) is 'static' but token is dynamic. And the tools that the organisation has treats those two separately. 
   
   Example issue https://github.com/apache/airflow/issues/19217 (but there were few other similar discussions).
   
   I do not have a "perfect" solution for that but I thought about an options of joinng the two options allow to connect "metadata" connection with "secret" credentials. 
   
   One possible solution I could imagine the case that you configure your connection in Airflow DB and in the place of password you put:
   
   ```
   SECRET:/connections/my_secret_password
   ```
   Then airflow could combine thet two and retrieve the metadata for most of the conneciton and secret from the secrets manager.
   
   I think it should be rather easy to implement something like that (and have a flag in secrets which would allow to combine metadata + secrets).
   
   WDYT?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
dstandish edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980718633


   @potiuk  You don't necessary need to write a custom secrets backend in order to have secrets rotation.  E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764).  and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   that example:
   
   > And the tools that the organisation has treats those two separately
   
   is that really all that common?
   
   i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation.  i think it might be best to leave it to the external tools.
   
   and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow.  e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   --- 
   
   separately, what do you think of the general idea of this PR though?  should i proceed with it you think?  i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   Already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize the approach.
   
   I considered possibly using a config key in `airflow.cfg` to flip the default expectation from URI to json. And we could do this and put it in `[secrets]`.  But the problem is then you don't have the flexibility to use json in one backend, URI in another etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19857: Enable json serialization for secrets backend

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in order to have secrets rotation. E.g. if you want to use aws secrets manager's built-in secrets rotation capabilities, the existing backend [now supports it](https://github.com/apache/airflow/pull/18764). and presumably with most of the backends we have, you could implement rotation with processes external to airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and you have to add quite a lot of code to support similar patterns for other secrets. I thought about something more "generic" . Maybe we should apply the approach of AWS for all other backends? Maybe we can find a similar,  generic-enough solution for all of them.  I think your proposal is attempt to do so, but I think (see below) it might be that it looks at the problem from a wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this case a person who operates organisational secrets and has to organize management. security, rotation etc.), I believe our approach is very "airflow centric" and in fact "selfish". If I were such a person who manages "credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should not think about them as "general storage" which adapts to the client but they are "centerpiece" of managing identities, secrets, passwords etc. by the organisations. And often follow a rigorous structure of secrets stored there, that is decided on an managed on "organisational" level. For example it is important, I believe, that if there is a service that can be acesses by many clients, the "secret" (password/token) is kept in the secret manager only once - and it's the clients should adapt to where and how it should be retrieved from. It simplifies rotation, security, access audits, security crisis responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects the secret to be in a format that only Airflow understands. And it's the same - whether it's URI or dictionary in your proposal. No other system that needs the same password for the same external service will be able to use it (unles they specifically implement support for "airflow-formatted-secret"). This basically means that if the same token needs to be used by another (non-Airlfow) client, the organisation will have to rotate it in two places - once in the format that is required by Airflow, and once in the format that is understood by the other client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the reason why we have configurable fields is because we wanted to tap into "non airflow formats" of storing the passwords/token. Where I think the only "universal" assumption we can make is that "password" or "token" is stored as "single, standalone secret". Not as part of URI, not as part of dictionary of specific structure. This is one of the reasons why for example LDAP clients are so configurable - because the clients have to be able to adapt to the organisational schema chosen by the organisation, not the other way round. AWS implementation (with full_url_mode = false) now is pretty much what LDAP clients do.
   
   Looking at it from the organisation that has to maintain many systems, I think it's very "selfish" (and self-centered) of Airflow to expect the organisation to store the "token/password" in the format that only Airflow understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / connections management / rotation. i think it might be best to leave it to the external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I made with `SECRET:' is probably not good (for the reason you explained below. It was really one of the ways how to retrieve the password as single "secret" and plug-in the existing Airflow approach. But the AWS approach is indeed better (though more complex implementation and more difficult to generalize to all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "passwords" as part of the structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part of it from _another_ secrets backend... that could get confusing pretty quickly, particularly when considering the diversity in the structure of connections in airflow. e.g. it's not just `password` that might be "secret" and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic enough - it is not limited to password only - the "SECRET://" pattern could be used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? should i proceed with it you think? i think i would also like to add abilitity to load from cli using json instead of URI (i.e. putting json on same level as URI) and, ultimately i think it's best to deprecate airflow URI but that could be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line of thinking. As explained above - it still requires from the organisation to keep the structure that only Airflow understands. I think this direction is not good in general, I think we should implement a way to retrieve secrets similarly to what AWS does rather than introduce yet-another format of storing secrets, but If I am the only one who thinks this is wrong and won't convince others, then yeah - it's certainly better than URI, so I am not going to block it.
   
   > already we have some secrets backends supporting json values, and we could continue to add support on a piecemeal basis, but i figure we should just make it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to impose "airflow-standard" on something that I feel we should not (i.e. centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets in proprietary formats. Maybe it's simply worth to check it with a number of (big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org