You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/25 07:08:18 UTC

[GitHub] [airflow] dstandish opened a new pull request #21816: Deprecate non-JSON conn.extra

dstandish opened a new pull request #21816:
URL: https://github.com/apache/airflow/pull/21816


   Connection extra field is generally assumed to be JSON but we don't actually require it.  Here we deprecate non-JSON extra so that in 3.0 we can require it.  Further, we require that it not just be any json but must also parse as dict, because a string value such as '"hi"' or '[1,2,3]' is json, but a very bad practice.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1053732867


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1054774060


   _Python_ should be capitalised everywhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #21816:
URL: https://github.com/apache/airflow/pull/21816


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1053732846


   Agree with @mik-laj that UPDATING.md is needed before merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on a change in pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
dstandish commented on a change in pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#discussion_r816504357



##########
File path: UPDATING.md
##########
@@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### Deprecation: `Connection.extra` must be JSON-encoded dict
+
+#### TLDR
+
+From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded python dict.

Review comment:
       ```suggestion
   From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded Python dict.
   ```

##########
File path: UPDATING.md
##########
@@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### Deprecation: `Connection.extra` must be JSON-encoded dict
+
+#### TLDR
+
+From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded python dict.
+
+#### What, why, and when?
+
+Airflow's Connection is used for storing credentials.  For storage of information that does not
+fit into user / password / host / schema / port, we have the `extra` string field.  Its intention
+was always to provide for storage of arbitrary key-value pairs, like `no_host_key_check` in the SSH
+hook, or `keyfile_dict` in GCP.
+
+But since the field is string, it's technically been permissible to store any string value.  For example
+one could have stored the string value `'my-website.com'` and used this in the hook.  But this is a very
+bad practice. One reason is intelligibility: when you look at the value for `extra`, you don't have any idea
+what its purpose is.  Better would be to store `{"api_host": "my-website.com"}` which at least tells you
+*something* about the value.  Another reason is extensibility: if you store the API host as a simple string
+value, what happens if you need to add more information, such as the API endpoint, or credentials?  Then
+you would need to convert the string to a dict, and this would be a breaking change.
+
+For these reason, starting in Airflow 3.0 we will require that the `Connection.extra` field store
+a JSON-encoded python dict.

Review comment:
       ```suggestion
   a JSON-encoded Python dict.
   ```

##########
File path: airflow/models/connection.py
##########
@@ -134,10 +134,39 @@ def __init__(
             self.schema = schema
             self.port = port
             self.extra = extra
+        if self.extra:
+            self._validate_extra(self.extra, self.conn_id)
 
         if self.password:
             mask_secret(self.password)
 
+    @staticmethod
+    def _validate_extra(extra, conn_id) -> None:
+        """
+        Here we verify that ``extra`` is a JSON-encoded python dict.  From Airflow 3.0, we should no

Review comment:
       ```suggestion
           Here we verify that ``extra`` is a JSON-encoded Python dict.  From Airflow 3.0, we should no
   ```

##########
File path: UPDATING.md
##########
@@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### Deprecation: `Connection.extra` must be JSON-encoded dict
+
+#### TLDR
+
+From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded python dict.
+
+#### What, why, and when?
+
+Airflow's Connection is used for storing credentials.  For storage of information that does not
+fit into user / password / host / schema / port, we have the `extra` string field.  Its intention
+was always to provide for storage of arbitrary key-value pairs, like `no_host_key_check` in the SSH
+hook, or `keyfile_dict` in GCP.
+
+But since the field is string, it's technically been permissible to store any string value.  For example
+one could have stored the string value `'my-website.com'` and used this in the hook.  But this is a very
+bad practice. One reason is intelligibility: when you look at the value for `extra`, you don't have any idea
+what its purpose is.  Better would be to store `{"api_host": "my-website.com"}` which at least tells you
+*something* about the value.  Another reason is extensibility: if you store the API host as a simple string
+value, what happens if you need to add more information, such as the API endpoint, or credentials?  Then
+you would need to convert the string to a dict, and this would be a breaking change.
+
+For these reason, starting in Airflow 3.0 we will require that the `Connection.extra` field store
+a JSON-encoded python dict.
+
+#### How will I be affected?
+
+For users of providers that are included in the Airflow codebase, you should not have to make any changes
+because in the Airflow codebase we should not allow hooks to misuse the `Connection.extra` field in this way.
+
+However, if you have any custom hooks that store something other than JSON dict, you will have to update it.
+If you do, you should see a warning any time that this connection is retrieved or instantiated (e.g. it should show up in
+task logs).
+
+To see if you have any connections that will need to be updated, you can run this command:
+
+```shell
+airflow connections export - 2>&1 >/dev/null | grep 'non-JSON'
+```
+
+This will catch any warnings about connections that are storing something other than JSON-encoded python dict in the `extra` field.

Review comment:
       ```suggestion
   This will catch any warnings about connections that are storing something other than JSON-encoded Python dict in the `extra` field.
   ```

##########
File path: airflow/models/connection.py
##########
@@ -134,10 +134,39 @@ def __init__(
             self.schema = schema
             self.port = port
             self.extra = extra
+        if self.extra:
+            self._validate_extra(self.extra, self.conn_id)
 
         if self.password:
             mask_secret(self.password)
 
+    @staticmethod
+    def _validate_extra(extra, conn_id) -> None:
+        """
+        Here we verify that ``extra`` is a JSON-encoded python dict.  From Airflow 3.0, we should no
+        longer suppress these errors but raise instead.
+        """
+        if extra is None:
+            return None
+        try:
+            extra_parsed = json.loads(extra)
+            if not isinstance(extra_parsed, dict):
+                warnings.warn(
+                    "Encountered JSON value in `extra` which does not parse as a dictionary in "
+                    f"connection {conn_id!r}. From Airflow 3.0, the `extra` field must contain a JSON "
+                    "representation of a python dict.",

Review comment:
       ```suggestion
                       "representation of a Python dict.",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1055104036


   > _Python_ should be capitalised everywhere.
   
   done thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1050980451


   Can you add a note in UPGRADING.md?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1055544189


   Maybe the test will pass? repeating issue (not fully consistent but we need to fix it soon).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1054582122


   Added section in updating.md.
   
   Users can find all "bad" connections with `airflow connections export - 2>&1 >/dev/null | grep 'non-JSON'`
   
   We can add this to upgrade_check when we're preparing for 3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#discussion_r815239416



##########
File path: airflow/models/connection.py
##########
@@ -134,10 +134,32 @@ def __init__(
             self.schema = schema
             self.port = port
             self.extra = extra
+        if self.extra:
+            self._validate_extra(self.extra, self.conn_id)
 
         if self.password:
             mask_secret(self.password)
 
+    @staticmethod
+    def _validate_extra(extra, conn_id):
+        try:
+            _extra_parsed = json.loads(extra)
+            if not isinstance(_extra_parsed, dict):
+                warnings.warn(
+                    "Encountered JSON value in `extra` which does not parse as a dictionary in "
+                    f"connection {conn_id!r}. From Airflow 3.0, the `extra` field must contain a JSON "
+                    "representation of a python dict.",
+                    DeprecationWarning,
+                    stacklevel=2,

Review comment:
       ```suggestion
                       stacklevel=3,
   ```
   
   I think this is the right level?

##########
File path: airflow/models/connection.py
##########
@@ -134,10 +134,32 @@ def __init__(
             self.schema = schema
             self.port = port
             self.extra = extra
+        if self.extra:
+            self._validate_extra(self.extra, self.conn_id)
 
         if self.password:
             mask_secret(self.password)
 
+    @staticmethod
+    def _validate_extra(extra, conn_id):
+        try:
+            _extra_parsed = json.loads(extra)
+            if not isinstance(_extra_parsed, dict):

Review comment:
       ```suggestion
               extra_parsed = json.loads(extra)
               if not isinstance(extra_parsed, dict):
   ```
   
   nit: locally scoped, doesn't need to be private.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1054537638


   > Agree with @mik-laj that UPDATING.md is needed before merging
   
   yes of course will be taking care of that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #21816: Deprecate non-JSON conn.extra

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#issuecomment-1050980451


   Can you add a note in UPDATING.md?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org