You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/07 22:17:16 UTC

[GitHub] [airflow] ashb opened a new pull request #8772: Correctly store non-default Nones in serialized tasks/dags

ashb opened a new pull request #8772:
URL: https://github.com/apache/airflow/pull/8772


   The default schedule_interval for a DAG is `@daily`, so
   `schedule_interval=None` is actually not the default, but we were not
   storing _any_ null attributes previously.
   
   This meant that upon re-inflating the DAG the schedule_interval would
   become @daily.
   
   This fixes that problem, and extends the test to look at _all_ the
   serialized attributes in our round-trip tests, rather than just the few
   that the webserver cared about.
   
   It doesn't change the serialization format, it just changes what/when
   values were stored.
   
   This solution was more complex than I hoped for, but the test case in
   test_operator_subclass_changing_base_defaults is a real one that the
   round trip tests discovered from the DatabricksSubmitRunOperator -- I
   have just captured it in this test in case that specific operator
   changes in future.
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421982146



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -254,7 +260,12 @@ def _deserialize_timedelta(cls, seconds: int) -> datetime.timedelta:
         return datetime.timedelta(seconds=seconds)
 
     @classmethod
-    def _value_is_hardcoded_default(cls, attrname: str, value: Any) -> bool:
+    def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:

Review comment:
       Thanks, was working on (what I thought would be a quick fix) this for 11 hours. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625803075


   Curious failure:
   
   ```
   E                   AssertionError: default_args[start_date] matches
   E                   assert <Pendulum [2020-05-07T22:49:48.687518+00:00]> == <Pendulum [2020-05-07T22:49:46.451514+00:00]>
   ```
   
   I wonder why I didn't see that locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421826962



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default():
+            nonlocal ctor_params, attrname, value

Review comment:
       I went through a number of iterations of way of doing this, it may be clearer just to pass them in anyway on Py3. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421826729



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default():
+            nonlocal ctor_params, attrname, value

Review comment:
       This won't work on python2, instead we'd have to do
   
   ```
           def _is_default(ctor_params, attrname, value):
   ```
   
   and pass em in at the call site




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422119825



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -119,11 +120,16 @@ def _is_primitive(cls, var: Any) -> bool:
     @classmethod
     def _is_excluded(cls, var: Any, attrname: str, instance: Any) -> bool:
         """Types excluded from serialization."""
-        # pylint: disable=unused-argument
+
+        if var is None:
+            if not cls._is_constcutor_param(attrname, instance):

Review comment:
       ```suggestion
               if not cls._is_constructor_param(attrname, instance):
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422534060



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +416,60 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default(ctor_params, attrname, value):
+            if attrname not in ctor_params:
+                return False
+            ctor_default = ctor_params[attrname].default
+
+            # Also returns True if the value is an empty list or empty dict.
+            # This is done to account for the case where the default value of
+            # the field is None but has the ``field = field or {}`` set.
+            return ctor_default is value or (ctor_default is None and value in [{}, []])
+
+        for typ in type(instance).mro():
+            ctor_params = cls.__constructor_params_for_subclass(typ)
+
+            if _is_default(ctor_params, attrname, value):
+                if typ is BaseOperator:
+                    return True
+                # For added fun, if a subclass sets a different default value to the
+                # same argument, (i.e. a subclass changes default of do_xcom_push from
+                # True to False), we then do want to include it.
+                #
+                # This is because we set defaults based on BaseOperators
+                # defaults, so if we didn't set this when inflating we'd
+                # have the wrong value
+
+                base_op_ctor_params = cls.__constructor_params_for_subclass(BaseOperator)
+                if attrname not in base_op_ctor_params:
+                    return True
+                return base_op_ctor_params[attrname].default == value
+
+            if typ is BaseOperator:
+                break

Review comment:
       It's the subclass test I added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422534563



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +416,60 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default(ctor_params, attrname, value):
+            if attrname not in ctor_params:
+                return False
+            ctor_default = ctor_params[attrname].default
+
+            # Also returns True if the value is an empty list or empty dict.
+            # This is done to account for the case where the default value of
+            # the field is None but has the ``field = field or {}`` set.
+            return ctor_default is value or (ctor_default is None and value in [{}, []])
+
+        for typ in type(instance).mro():
+            ctor_params = cls.__constructor_params_for_subclass(typ)
+
+            if _is_default(ctor_params, attrname, value):
+                if typ is BaseOperator:
+                    return True
+                # For added fun, if a subclass sets a different default value to the
+                # same argument, (i.e. a subclass changes default of do_xcom_push from
+                # True to False), we then do want to include it.
+                #
+                # This is because we set defaults based on BaseOperators
+                # defaults, so if we didn't set this when inflating we'd
+                # have the wrong value
+
+                base_op_ctor_params = cls.__constructor_params_for_subclass(BaseOperator)
+                if attrname not in base_op_ctor_params:
+                    return True
+                return base_op_ctor_params[attrname].default == value
+
+            if typ is BaseOperator:
+                break

Review comment:
       👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422121186



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -285,10 +297,14 @@ class SerializedBaseOperator(BaseOperator, BaseSerialization):
 
     _decorated_fields = {'executor_config'}
 
-    _CONSTRUCTOR_PARAMS = {
-        k: v for k, v in signature(BaseOperator).parameters.items()
-        if v.default is not v.empty
-    }
+    @staticmethod
+    @functools.lru_cache(maxsize=128)

Review comment:
       Can you add a comment on the benefits / need of using `lru_cache` here

##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -285,10 +297,14 @@ class SerializedBaseOperator(BaseOperator, BaseSerialization):
 
     _decorated_fields = {'executor_config'}
 
-    _CONSTRUCTOR_PARAMS = {
-        k: v for k, v in signature(BaseOperator).parameters.items()
-        if v.default is not v.empty
-    }
+    @staticmethod
+    @functools.lru_cache(maxsize=128)

Review comment:
       Can you add a comment on the benefits / need of using `lru_cache` here, please?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626244699


   I was wrong about where that fn was used, so I've removed that complex code an replace it with this in the test instead:
   
   ```python
           if serialized_task.resources is None:
               assert task.resources is None or task.resources == []
           else:
               assert serialized_task.resources == task.resources
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626240186


   Feel free to merge once we add a comment in the doc on why we kept the change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626244825


   (We've got that code in this diff comment, we can bring it back when we want it, but that will need more changes elsewhere to serialize more fields


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422375825



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default():
+            nonlocal ctor_params, attrname, value

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625814458


   Curious, the DAG with the problem has `    'start_date': datetime.utcnow()` -- that's "wrong" but I wouldn't have expected this test to fail. I guess we are loading the dag twice, once with s10n, once without and _then_comparing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422529496



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +416,60 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default(ctor_params, attrname, value):
+            if attrname not in ctor_params:
+                return False
+            ctor_default = ctor_params[attrname].default
+
+            # Also returns True if the value is an empty list or empty dict.
+            # This is done to account for the case where the default value of
+            # the field is None but has the ``field = field or {}`` set.
+            return ctor_default is value or (ctor_default is None and value in [{}, []])
+
+        for typ in type(instance).mro():
+            ctor_params = cls.__constructor_params_for_subclass(typ)
+
+            if _is_default(ctor_params, attrname, value):
+                if typ is BaseOperator:
+                    return True
+                # For added fun, if a subclass sets a different default value to the
+                # same argument, (i.e. a subclass changes default of do_xcom_push from
+                # True to False), we then do want to include it.
+                #
+                # This is because we set defaults based on BaseOperators
+                # defaults, so if we didn't set this when inflating we'd
+                # have the wrong value
+
+                base_op_ctor_params = cls.__constructor_params_for_subclass(BaseOperator)
+                if attrname not in base_op_ctor_params:
+                    return True
+                return base_op_ctor_params[attrname].default == value
+
+            if typ is BaseOperator:
+                break

Review comment:
       Checking BaseOperator might be enough. I think there was a test you said was failing, do you know which one was that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BasPH commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
BasPH commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421977634



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -254,7 +260,12 @@ def _deserialize_timedelta(cls, seconds: int) -> datetime.timedelta:
         return datetime.timedelta(seconds=seconds)
 
     @classmethod
-    def _value_is_hardcoded_default(cls, attrname: str, value: Any) -> bool:
+    def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:

Review comment:
       ```suggestion
       def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625814879


   Oh, cos it's parsing in a subprocess. Hmmmm!
   
   I don't think we need to parse _all_ dags in a subprocess, just one is enough.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422132180



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
 
         return op
 
+    @classmethod
+    def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+        # Check all super classes too
+        return any(
+            attrname in cls.__constructor_params_for_subclass(typ)
+            for typ in type(instance).mro()
+        )
+
+    @classmethod
+    def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+        """
+        Check if ``value`` is the default value for ``attrname`` as set by the
+        constructor of ``instance``, or any of it's parent classes up
+        to-and-including BaseOperator.
+
+        .. seealso::
+
+            :py:meth:`BaseSerialization._value_is_hardcoded_default`
+        """
+
+        def _is_default():
+            nonlocal ctor_params, attrname, value

Review comment:
       Yeah let's just pass them in




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626013593


   > Oh, cos it's parsing in a subprocess. Hmmmm!
   > 
   > I don't think we need to parse _all_ dags in a subprocess, just one is enough.
   
   Changed it to only parse example_dags (rather than all the provider dags too) in the subprocess, and the rest are just round-tripped. This cut the test time for test_serialized_dags from 8s to 4-5s on my laptop too as a bonus :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422534178



##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -648,6 +683,23 @@ def test_dag_serialized_fields_with_schema(self):
         dag_params: set = set(dag_schema.keys()) - ignored_keys
         self.assertEqual(set(DAG.get_serialized_fields()), dag_params)
 
+    def test_operator_subclass_changing_base_defaults(self):

Review comment:
       @kaxil this is the case that means we need to check the MRO for defaults.
   
   I think storing all non-defaults is nicer anyway - we can show these in the UI that way




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625803983


   > Curious failure:
   > 
   > ```
   > E                   AssertionError: default_args[start_date] matches
   > E                   assert <Pendulum [2020-05-07T22:49:48.687518+00:00]> == <Pendulum [2020-05-07T22:49:46.451514+00:00]>
   > ```
   > 
   > I wonder why I didn't see that locally.
   
   somewhere `start_date` might be `.now()` causing it ?? not sure


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422127805



##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -273,58 +271,70 @@ def test_deserialization(self):
         for dag_id in stringified_dags:
             self.validate_deserialized_dag(stringified_dags[dag_id], dags[dag_id])
 
-        example_skip_dag = stringified_dags['example_skip_dag']
-        skip_operator_1_task = example_skip_dag.task_dict['skip_operator_1']
-        self.validate_deserialized_task(
-            skip_operator_1_task, 'DummySkipOperator', '#e8b7e4', '#000')
-
-        # Verify that the DAG object has 'full_filepath' attribute
-        # and is equal to fileloc
-        self.assertTrue(hasattr(example_skip_dag, 'full_filepath'))
-        self.assertEqual(example_skip_dag.full_filepath, example_skip_dag.fileloc)
-
-        example_subdag_operator = stringified_dags['example_subdag_operator']
-        section_1_task = example_subdag_operator.task_dict['section-1']
-        self.validate_deserialized_task(
-            section_1_task,
-            SubDagOperator.__name__,
-            SubDagOperator.ui_color,
-            SubDagOperator.ui_fgcolor
-        )
-
     def validate_deserialized_dag(self, serialized_dag, dag):
         """
         Verify that all example DAGs work with DAG Serialization by
         checking fields between Serialized Dags & non-Serialized Dags
         """
-        fields_to_check = [
-            "task_ids", "params", "fileloc", "max_active_runs", "concurrency",
-            "is_paused_upon_creation", "doc_md", "safe_dag_id", "is_subdag",
-            "catchup", "description", "start_date", "end_date", "parent_dag",
-            "template_searchpath", "_access_control", "dagrun_timeout"
-        ]
-
-        # fields_to_check = dag.get_serialized_fields()
+        fields_to_check = dag.get_serialized_fields() - {
+            # Doesn't implement __eq__ properly. Check manually
+            'timezone',
+
+            # Need to check fields in it, to exclude functions
+            'default_args',
+        }
         for field in fields_to_check:
-            self.assertEqual(getattr(serialized_dag, field), getattr(dag, field))
+            assert getattr(serialized_dag, field) == getattr(dag, field), f'{field} matches'
+
+        if dag.default_args:
+            for k, v in dag.default_args.items():
+                if callable(v):
+                    # Check we stored _someting_.
+                    assert k in serialized_dag.default_args
+                else:
+                    assert v == serialized_dag.default_args[k], f'default_args[{k}] matches'

Review comment:
       ```suggestion
                       assert v == serialized_dag.default_args[k], f'default_args[{k}] does not match'
   ```
   
   ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org