You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/05 18:39:23 UTC

[GitHub] [airflow] vincbeck commented on a diff in pull request #25432: Avoid requirement that AWS Secret Manager JSON values be urlencoded.

vincbeck commented on code in PR #25432:
URL: https://github.com/apache/airflow/pull/25432#discussion_r939093095


##########
airflow/providers/amazon/aws/secrets/secrets_manager.py:
##########
@@ -94,6 +95,9 @@ class SecretsManagerBackend(BaseSecretsBackend, LoggingMixin):
     :param full_url_mode: if True, the secrets must be stored as one conn URI in just one field per secret.
         If False (set it as false in backend_kwargs), you can store the secret using different
         fields (password, user...).
+    :param secret_values_are_urlencoded: If True, and full_url_mode is False, then the values are assumed to

Review Comment:
   For such boolean value, the verb is/are is always at the beginning
   
   ```suggestion
       :param are_secret_values_urlencoded: If True, and full_url_mode is False, then the values are assumed to
   ```



##########
airflow/providers/amazon/aws/secrets/secrets_manager.py:
##########
@@ -150,31 +175,154 @@ def _format_uri_with_extra(secret, conn_string):
 
         return conn_string
 
-    def get_uri_from_secret(self, secret):
+    def get_connection(self, conn_id: str) -> Optional[Connection]:
+        if not self.full_url_mode:
+            secret_string = self._get_secret(self.connections_prefix, conn_id)
+            secret_dict = self._deserialize_json_string(secret_string)
+
+            if not secret_dict:
+                return None
+
+            if 'extra' in secret_dict and isinstance(secret_dict['extra'], str):
+                secret_dict['extra'] = self._deserialize_json_string(secret_dict['extra'])
+
+            data = self._standardize_secret_keys(secret_dict)
+
+            if self.secret_values_are_urlencoded:
+                data = self._remove_escaping_in_secret_dict(secret=data, conn_id=conn_id)
+
+            port: Optional[int] = None
+
+            if data['port'] is not None:
+                port = int(data['port'])
+
+            return Connection(
+                login=data['user'],
+                password=data['password'],
+                host=data['host'],
+                port=port,
+                schema=data['schema'],
+                conn_type=data['conn_type'],
+                extra=data['extra'],
+            )
+
+        return super().get_connection(conn_id=conn_id)
+
+    def _standardize_secret_keys(self, secret: Dict[str, Any]) -> Dict[str, Any]:
+        """Standardize the names of the keys in the dict. These keys align with"""
         possible_words_for_conn_fields = {
             'user': ['user', 'username', 'login', 'user_name'],
             'password': ['password', 'pass', 'key'],
             'host': ['host', 'remote_host', 'server'],
             'port': ['port'],
             'schema': ['database', 'schema'],
             'conn_type': ['conn_type', 'conn_id', 'connection_type', 'engine'],
+            'extra': ['extra'],
         }
 
         for conn_field, extra_words in self.extra_conn_words.items():
             possible_words_for_conn_fields[conn_field].extend(extra_words)
 
-        conn_d = {}
+        conn_d: Dict[str, Any] = {}
         for conn_field, possible_words in possible_words_for_conn_fields.items():
             try:
                 conn_d[conn_field] = [v for k, v in secret.items() if k in possible_words][0]
             except IndexError:
-                conn_d[conn_field] = ''
+                conn_d[conn_field] = None
 
-        conn_string = "{conn_type}://{user}:{password}@{host}:{port}/{schema}".format(**conn_d)
+        return conn_d
 
+    def get_uri_from_secret(self, secret: Dict[str, str]) -> str:
+        conn_d: Dict[str, str] = {k: v if v else '' for k, v in self._standardize_secret_keys(secret).items()}
+        conn_string = "{conn_type}://{user}:{password}@{host}:{port}/{schema}".format(**conn_d)
         return self._format_uri_with_extra(secret, conn_string)
 
-    def get_conn_value(self, conn_id: str):
+    def _deserialize_json_string(self, value: Optional[str]) -> Optional[Dict[Any, Any]]:
+        if not value:
+            return None
+        try:
+            # Use ast.literal_eval for backwards compatibility.
+            # Previous version of this code had a comment saying that using json.loads caused errors.
+            # This likely means people were using dict reprs instead of valid JSONs.
+            res: Dict[str, Any] = json.loads(value)
+        except json.JSONDecodeError:
+            try:
+                res = ast.literal_eval(value) if value else None
+                warnings.warn(
+                    f'In future versions, `{type(self).__name__}` will only support valid JSONs, not dict'
+                    ' reprs. Please make sure your secret is a valid JSON.'
+                )
+            except ValueError:  # 'malformed node or string: ' error, for empty conns
+                return None
+
+        return res
+
+    def _remove_escaping_in_secret_dict(self, secret: Dict[str, Any], conn_id: str) -> Dict[str, Any]:
+        # When ``unquote(v) == v``, then removing unquote won't affect the user, regardless of
+        # whether or not ``v`` is URL-encoded. For example, "foo bar" is not URL-encoded. But
+        # because decoding it doesn't affect the value, then it will migrate safely when
+        # ``unquote`` gets removed.
+        #
+        # When parameters are URL-encoded, but decoding is idempotent, we need to warn the user
+        # to un-escape their secrets. For example, if "foo%20bar" is a URL-encoded string, then
+        # decoding is idempotent because ``unquote(unquote("foo%20bar")) == unquote("foo%20bar")``.
+        #
+        # In the rare situation that value is URL-encoded but the decoding is _not_ idempotent,
+        # this causes a major issue. For example, if "foo%2520bar" is URL-encoded, then decoding is
+        # _not_ idempotent because ``unquote(unquote("foo%2520bar")) != unquote("foo%2520bar")``
+        #
+        # This causes a problem for migration because if the user decodes their value, we cannot
+        # infer that is the case by looking at the decoded value (from our vantage point, it will
+        # look to be URL-encoded.)
+        #
+        # So when this uncommon situation occurs, the user _must_ adjust the configuration and set
+        # ``parameters_are_urlencoded`` to False to migrate safely. In all other cases, we do not
+        # need the user to adjust this object to migrate; they can transition their secrets with
+        # the default configuration.
+
+        warn_user = False
+        idempotent = True
+
+        for k, v in secret.copy().items():
+
+            if k == "extra" and isinstance(v, dict):
+                # The old behavior was that extras were _not_ urlencoded inside the secret.
+                # If they were urlencoded (e.g. "foo%20bar"), then they would be re-urlencoded
+                # (e.g. "foo%20bar" becomes "foo%2520bar") and then unquoted once when parsed.
+                # So we should just allow the extra dict to remain as-is.
+                continue
+
+            elif v is not None:
+                v_unquoted = unquote(v)
+                if v != v_unquoted:
+                    secret[k] = unquote(v)
+                    warn_user = True
+
+                    # Check to see if decoding is idempotent.
+                    if v_unquoted == unquote(v_unquoted):
+                        idempotent = False
+
+        if warn_user:
+            msg = (
+                "When ``full_url_mode=True``, URL-encoding secret values is deprecated. In future versions, "
+                f" this value will not be un-escaped. For the conn_id {conn_id!r}, please remove the"
+                " URL-encoding."
+                "\n\nThis warning was raised because the SecretsManagerBackend detected that this connection"
+                " was URL-encoded."
+            )
+            if idempotent:
+                msg = f" Once the values for conn_id {conn_id!r} are decoded, this warning will go away."
+            if not idempotent:
+                msg += (
+                    " In addition to decoding the values for your connection, you must also set"
+                    " ``secret_values_are_urlencoded=False`` for your config variable"
+                    " ``secrets.backend_kwargs`` because this connection's URL encoding is not idempotent."

Review Comment:
   I am not sure `because this connection's URL encoding is not idempotent` gives enough details to the user in order to understand the issue. I would add a link to the documentation you are updating in this PR



##########
docs/apache-airflow-providers-amazon/secrets-backends/aws-secrets-manager.rst:
##########
@@ -83,6 +83,27 @@ Verify that you can get the secret:
 
 If you don't want to use any ``connections_prefix`` for retrieving connections, set it as an empty string ``""`` in the configuration.
 
+URL-Encoding of Secrets When Full URL Mode is False
+"""""""""""""""""""""""""""""""""""""""""""""""""""
+
+Previous versions of the Amazon provider package required values in the AWS secret to be URL-encoded when the setting ``full_url_mode`` is ``false``.
+This behavior is now deprecated, and will be removed at a future date.
+
+In most cases, you should not have any issues migrating your secrets to not being URL-encoded in advance of the deprecation.
+Simply decoding your secret values will work, and no further changes are required.
+
+In rare circumstances, when URL-encoding is not idempotent, the ``DeprecationWarning`` will tell you to add a new parameter to your ``backend_kwargs``.

Review Comment:
   I would add an example to explain what idempotent means in this situation. I would actually reuse the example you gave in comments
   ```
   For example, if "foo%2520bar" is URL-encoded, then decoding is _not_ idempotent because ``unquote(unquote("foo%2520bar")) != unquote("foo%2520bar")``
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org