You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/10 13:28:11 UTC

[GitHub] [airflow] bolkedebruin opened a new pull request, #28829: Make allowed_deserialization_classes more intuitive

bolkedebruin opened a new pull request, #28829:
URL: https://github.com/apache/airflow/pull/28829

   Regexps can be tough to get right. Typically someone would like to allow any classes below 'mymodule' to match. For example, 'mymodule.dataclasses' by setting allowed_deserialization_classes to 'mymodule.*'. However this matches everything starting with mymodule, so also mymodulemalicious. This change replaces bare '.' with '\..' so it matches the literal '.' as well.
   
   @kaxil 
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bolkedebruin merged pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by GitBox <gi...@apache.org>.
bolkedebruin merged PR #28829:
URL: https://github.com/apache/airflow/pull/28829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] barrysteyn commented on pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by "barrysteyn (via GitHub)" <gi...@apache.org>.
barrysteyn commented on PR #28829:
URL: https://github.com/apache/airflow/pull/28829#issuecomment-1704668176

   This change has totally broken things... for example, I have class that needs to be on the `allowed_deserialization_classes` list... the error message I get is `libs.dynamodb.models.ideas.IdeasModel` is not on the allowed list.. I do understand that this is regex that is being matched against, however altering the regex is not (in my opinion) good. Firstly, people who are doing this are technical and should know that a `.` matches anything.
   
   But... the worst thing is that things don't work... For example, I have tested things by putting the following in the`allowed_deserialization_classes` (remember I am testing *libs.dynamodb.models.ideas.IdeasModel* for a match):
   
    1. `libs.dynamodb.models.ideas.IdeasModel` --> False (not expected)
    2. `libs[.]dynamodb[.]models[.]ideas[.]IdeasModel` --> True (weird)
    3. `libs.dynamodb.*` --> False
    4. `libs.*` --> True (this mimic'd the unit test of this change).
   
   It is obvious that the replace `.` with `..` is the culprit. Instead, I suggest we undo the changes, and rather document in detail on how to use the regex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kaxil commented on a diff in pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by GitBox <gi...@apache.org>.
kaxil commented on code in PR #28829:
URL: https://github.com/apache/airflow/pull/28829#discussion_r1065838979


##########
airflow/config_templates/config.yml:
##########
@@ -225,7 +225,7 @@ core:
       description: |
         What classes can be imported during deserialization. This is a multi line value.
         The individual items will be parsed as regexp. Python built-in classes (like dict)
-        are always allowed
+        are always allowed. Bare "." will be replaced so you can set airflow.* .

Review Comment:
   `airflow/config_templates/default_airflow.cfg` needs to be updated as well
   
   ```diff
   diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
   index 8f704c3..5a6e03e 100644
   --- a/airflow/config_templates/default_airflow.cfg
   +++ b/airflow/config_templates/default_airflow.cfg
   @@ -137,7 +137,7 @@ enable_xcom_pickling = False
    
    # What classes can be imported during deserialization. This is a multi line value.
    # The individual items will be parsed as regexp. Python built-in classes (like dict)
   -# are always allowed
   +# are always allowed. Bare "." will be replaced so you can set airflow.* .
    allowed_deserialization_classes = airflow\..*
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] barrysteyn commented on pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by "barrysteyn (via GitHub)" <gi...@apache.org>.
barrysteyn commented on PR #28829:
URL: https://github.com/apache/airflow/pull/28829#issuecomment-1705501321

   So, this is the regex replace for this PR:
   `p = re.sub(r"(\w)\.", r"\1\..", p)`
   
   I am no expert, but it seems to me that it replaces `word.` with `word\..` - I think the idea is to match `.`, however all that does is ensure that there must be a `.`  followed by any character... That is why things are failing...
   
   I don't advocate for changing anything, but if you really want to match just `.`, then the regex should be  `p = re.sub(r"(\w)\.", r"\1[.]", p)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] barrysteyn commented on pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by "barrysteyn (via GitHub)" <gi...@apache.org>.
barrysteyn commented on PR #28829:
URL: https://github.com/apache/airflow/pull/28829#issuecomment-1705501746

   At the very least, if I am using things incorrectly, then add more tests to explain how it should work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kaxil commented on a diff in pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by GitBox <gi...@apache.org>.
kaxil commented on code in PR #28829:
URL: https://github.com/apache/airflow/pull/28829#discussion_r1065837802


##########
tests/serialization/test_serde.py:
##########
@@ -32,7 +32,7 @@
     VERSION,
     _compile_patterns,
     deserialize,
-    serialize,
+    serialize, _match,

Review Comment:
   Static checks are failing for this:
   
   ```diff
   diff --git a/tests/serialization/test_serde.py b/tests/serialization/test_serde.py
   index 88b012d..475525c 100644
   --- a/tests/serialization/test_serde.py
   +++ b/tests/serialization/test_serde.py
   @@ -31,8 +31,9 @@ from airflow.serialization.serde import (
        SCHEMA_ID,
        VERSION,
        _compile_patterns,
   +    _match,
        deserialize,
   -    serialize, _match,
   +    serialize,
    )
    from airflow.utils.module_loading import qualname
    from tests.test_utils.config import conf_vars
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pierrejeambrun commented on pull request #28829: Make allowed_deserialization_classes more intuitive

Posted by "pierrejeambrun (via GitHub)" <gi...@apache.org>.
pierrejeambrun commented on PR #28829:
URL: https://github.com/apache/airflow/pull/28829#issuecomment-1456911236

   Depends on https://github.com/apache/airflow/pull/28829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org