You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/07 22:17:16 UTC
[GitHub] [airflow] ashb opened a new pull request #8772: Correctly store non-default Nones in serialized tasks/dags
ashb opened a new pull request #8772:
URL: https://github.com/apache/airflow/pull/8772
The default schedule_interval for a DAG is `@daily`, so
`schedule_interval=None` is actually not the default, but we were not
storing _any_ null attributes previously.
This meant that upon re-inflating the DAG the schedule_interval would
become @daily.
This fixes that problem, and extends the test to look at _all_ the
serialized attributes in our round-trip tests, rather than just the few
that the webserver cared about.
It doesn't change the serialization format, it just changes what/when
values were stored.
This solution was more complex than I hoped for, but the test case in
test_operator_subclass_changing_base_defaults is a real one that the
round trip tests discovered from the DatabricksSubmitRunOperator -- I
have just captured it in this test in case that specific operator
changes in future.
---
Make sure to mark the boxes below before creating PR: [x]
- [x] Description above provides context of the change
- [x] Unit tests coverage for changes (not needed for documentation changes)
- [x] Target Github ISSUE in description if exists
- [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
- [x] Relevant documentation is updated including usage instructions.
- [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
---
In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421982146
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -254,7 +260,12 @@ def _deserialize_timedelta(cls, seconds: int) -> datetime.timedelta:
return datetime.timedelta(seconds=seconds)
@classmethod
- def _value_is_hardcoded_default(cls, attrname: str, value: Any) -> bool:
+ def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
Review comment:
Thanks, was working on (what I thought would be a quick fix) this for 11 hours.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625803075
Curious failure:
```
E AssertionError: default_args[start_date] matches
E assert <Pendulum [2020-05-07T22:49:48.687518+00:00]> == <Pendulum [2020-05-07T22:49:46.451514+00:00]>
```
I wonder why I didn't see that locally.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421826962
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default():
+ nonlocal ctor_params, attrname, value
Review comment:
I went through a number of iterations of way of doing this, it may be clearer just to pass them in anyway on Py3. WDYT?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421826729
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default():
+ nonlocal ctor_params, attrname, value
Review comment:
This won't work on python2, instead we'd have to do
```
def _is_default(ctor_params, attrname, value):
```
and pass em in at the call site
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422119825
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -119,11 +120,16 @@ def _is_primitive(cls, var: Any) -> bool:
@classmethod
def _is_excluded(cls, var: Any, attrname: str, instance: Any) -> bool:
"""Types excluded from serialization."""
- # pylint: disable=unused-argument
+
+ if var is None:
+ if not cls._is_constcutor_param(attrname, instance):
Review comment:
```suggestion
if not cls._is_constructor_param(attrname, instance):
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422534060
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +416,60 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default(ctor_params, attrname, value):
+ if attrname not in ctor_params:
+ return False
+ ctor_default = ctor_params[attrname].default
+
+ # Also returns True if the value is an empty list or empty dict.
+ # This is done to account for the case where the default value of
+ # the field is None but has the ``field = field or {}`` set.
+ return ctor_default is value or (ctor_default is None and value in [{}, []])
+
+ for typ in type(instance).mro():
+ ctor_params = cls.__constructor_params_for_subclass(typ)
+
+ if _is_default(ctor_params, attrname, value):
+ if typ is BaseOperator:
+ return True
+ # For added fun, if a subclass sets a different default value to the
+ # same argument, (i.e. a subclass changes default of do_xcom_push from
+ # True to False), we then do want to include it.
+ #
+ # This is because we set defaults based on BaseOperators
+ # defaults, so if we didn't set this when inflating we'd
+ # have the wrong value
+
+ base_op_ctor_params = cls.__constructor_params_for_subclass(BaseOperator)
+ if attrname not in base_op_ctor_params:
+ return True
+ return base_op_ctor_params[attrname].default == value
+
+ if typ is BaseOperator:
+ break
Review comment:
It's the subclass test I added
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422534563
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +416,60 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default(ctor_params, attrname, value):
+ if attrname not in ctor_params:
+ return False
+ ctor_default = ctor_params[attrname].default
+
+ # Also returns True if the value is an empty list or empty dict.
+ # This is done to account for the case where the default value of
+ # the field is None but has the ``field = field or {}`` set.
+ return ctor_default is value or (ctor_default is None and value in [{}, []])
+
+ for typ in type(instance).mro():
+ ctor_params = cls.__constructor_params_for_subclass(typ)
+
+ if _is_default(ctor_params, attrname, value):
+ if typ is BaseOperator:
+ return True
+ # For added fun, if a subclass sets a different default value to the
+ # same argument, (i.e. a subclass changes default of do_xcom_push from
+ # True to False), we then do want to include it.
+ #
+ # This is because we set defaults based on BaseOperators
+ # defaults, so if we didn't set this when inflating we'd
+ # have the wrong value
+
+ base_op_ctor_params = cls.__constructor_params_for_subclass(BaseOperator)
+ if attrname not in base_op_ctor_params:
+ return True
+ return base_op_ctor_params[attrname].default == value
+
+ if typ is BaseOperator:
+ break
Review comment:
👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422121186
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -285,10 +297,14 @@ class SerializedBaseOperator(BaseOperator, BaseSerialization):
_decorated_fields = {'executor_config'}
- _CONSTRUCTOR_PARAMS = {
- k: v for k, v in signature(BaseOperator).parameters.items()
- if v.default is not v.empty
- }
+ @staticmethod
+ @functools.lru_cache(maxsize=128)
Review comment:
Can you add a comment on the benefits / need of using `lru_cache` here
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -285,10 +297,14 @@ class SerializedBaseOperator(BaseOperator, BaseSerialization):
_decorated_fields = {'executor_config'}
- _CONSTRUCTOR_PARAMS = {
- k: v for k, v in signature(BaseOperator).parameters.items()
- if v.default is not v.empty
- }
+ @staticmethod
+ @functools.lru_cache(maxsize=128)
Review comment:
Can you add a comment on the benefits / need of using `lru_cache` here, please?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626244699
I was wrong about where that fn was used, so I've removed that complex code an replace it with this in the test instead:
```python
if serialized_task.resources is None:
assert task.resources is None or task.resources == []
else:
assert serialized_task.resources == task.resources
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626240186
Feel free to merge once we add a comment in the doc on why we kept the change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626244825
(We've got that code in this diff comment, we can bring it back when we want it, but that will need more changes elsewhere to serialize more fields
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422375825
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default():
+ nonlocal ctor_params, attrname, value
Review comment:
Done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625814458
Curious, the DAG with the problem has ` 'start_date': datetime.utcnow()` -- that's "wrong" but I wouldn't have expected this test to fail. I guess we are loading the dag twice, once with s10n, once without and _then_comparing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422529496
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +416,60 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default(ctor_params, attrname, value):
+ if attrname not in ctor_params:
+ return False
+ ctor_default = ctor_params[attrname].default
+
+ # Also returns True if the value is an empty list or empty dict.
+ # This is done to account for the case where the default value of
+ # the field is None but has the ``field = field or {}`` set.
+ return ctor_default is value or (ctor_default is None and value in [{}, []])
+
+ for typ in type(instance).mro():
+ ctor_params = cls.__constructor_params_for_subclass(typ)
+
+ if _is_default(ctor_params, attrname, value):
+ if typ is BaseOperator:
+ return True
+ # For added fun, if a subclass sets a different default value to the
+ # same argument, (i.e. a subclass changes default of do_xcom_push from
+ # True to False), we then do want to include it.
+ #
+ # This is because we set defaults based on BaseOperators
+ # defaults, so if we didn't set this when inflating we'd
+ # have the wrong value
+
+ base_op_ctor_params = cls.__constructor_params_for_subclass(BaseOperator)
+ if attrname not in base_op_ctor_params:
+ return True
+ return base_op_ctor_params[attrname].default == value
+
+ if typ is BaseOperator:
+ break
Review comment:
Checking BaseOperator might be enough. I think there was a test you said was failing, do you know which one was that?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] BasPH commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
BasPH commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r421977634
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -254,7 +260,12 @@ def _deserialize_timedelta(cls, seconds: int) -> datetime.timedelta:
return datetime.timedelta(seconds=seconds)
@classmethod
- def _value_is_hardcoded_default(cls, attrname: str, value: Any) -> bool:
+ def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
Review comment:
```suggestion
def _is_constructor_param(cls, attrname: str, instance: Any) -> bool:
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625814879
Oh, cos it's parsing in a subprocess. Hmmmm!
I don't think we need to parse _all_ dags in a subprocess, just one is enough.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422132180
##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -395,6 +411,59 @@ def deserialize_operator(cls, encoded_op: Dict[str, Any]) -> BaseOperator:
return op
+ @classmethod
+ def _is_constcutor_param(cls, attrname: str, instance: Any) -> bool:
+ # Check all super classes too
+ return any(
+ attrname in cls.__constructor_params_for_subclass(typ)
+ for typ in type(instance).mro()
+ )
+
+ @classmethod
+ def _value_is_hardcoded_default(cls, attrname: str, value: Any, instance: Any) -> bool:
+ """
+ Check if ``value`` is the default value for ``attrname`` as set by the
+ constructor of ``instance``, or any of it's parent classes up
+ to-and-including BaseOperator.
+
+ .. seealso::
+
+ :py:meth:`BaseSerialization._value_is_hardcoded_default`
+ """
+
+ def _is_default():
+ nonlocal ctor_params, attrname, value
Review comment:
Yeah let's just pass them in
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-626013593
> Oh, cos it's parsing in a subprocess. Hmmmm!
>
> I don't think we need to parse _all_ dags in a subprocess, just one is enough.
Changed it to only parse example_dags (rather than all the provider dags too) in the subprocess, and the rest are just round-tripped. This cut the test time for test_serialized_dags from 8s to 4-5s on my laptop too as a bonus :)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422534178
##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -648,6 +683,23 @@ def test_dag_serialized_fields_with_schema(self):
dag_params: set = set(dag_schema.keys()) - ignored_keys
self.assertEqual(set(DAG.get_serialized_fields()), dag_params)
+ def test_operator_subclass_changing_base_defaults(self):
Review comment:
@kaxil this is the case that means we need to check the MRO for defaults.
I think storing all non-defaults is nicer anyway - we can show these in the UI that way
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#issuecomment-625803983
> Curious failure:
>
> ```
> E AssertionError: default_args[start_date] matches
> E assert <Pendulum [2020-05-07T22:49:48.687518+00:00]> == <Pendulum [2020-05-07T22:49:46.451514+00:00]>
> ```
>
> I wonder why I didn't see that locally.
somewhere `start_date` might be `.now()` causing it ?? not sure
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #8772: Correctly store non-default Nones in serialized tasks/dags
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #8772:
URL: https://github.com/apache/airflow/pull/8772#discussion_r422127805
##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -273,58 +271,70 @@ def test_deserialization(self):
for dag_id in stringified_dags:
self.validate_deserialized_dag(stringified_dags[dag_id], dags[dag_id])
- example_skip_dag = stringified_dags['example_skip_dag']
- skip_operator_1_task = example_skip_dag.task_dict['skip_operator_1']
- self.validate_deserialized_task(
- skip_operator_1_task, 'DummySkipOperator', '#e8b7e4', '#000')
-
- # Verify that the DAG object has 'full_filepath' attribute
- # and is equal to fileloc
- self.assertTrue(hasattr(example_skip_dag, 'full_filepath'))
- self.assertEqual(example_skip_dag.full_filepath, example_skip_dag.fileloc)
-
- example_subdag_operator = stringified_dags['example_subdag_operator']
- section_1_task = example_subdag_operator.task_dict['section-1']
- self.validate_deserialized_task(
- section_1_task,
- SubDagOperator.__name__,
- SubDagOperator.ui_color,
- SubDagOperator.ui_fgcolor
- )
-
def validate_deserialized_dag(self, serialized_dag, dag):
"""
Verify that all example DAGs work with DAG Serialization by
checking fields between Serialized Dags & non-Serialized Dags
"""
- fields_to_check = [
- "task_ids", "params", "fileloc", "max_active_runs", "concurrency",
- "is_paused_upon_creation", "doc_md", "safe_dag_id", "is_subdag",
- "catchup", "description", "start_date", "end_date", "parent_dag",
- "template_searchpath", "_access_control", "dagrun_timeout"
- ]
-
- # fields_to_check = dag.get_serialized_fields()
+ fields_to_check = dag.get_serialized_fields() - {
+ # Doesn't implement __eq__ properly. Check manually
+ 'timezone',
+
+ # Need to check fields in it, to exclude functions
+ 'default_args',
+ }
for field in fields_to_check:
- self.assertEqual(getattr(serialized_dag, field), getattr(dag, field))
+ assert getattr(serialized_dag, field) == getattr(dag, field), f'{field} matches'
+
+ if dag.default_args:
+ for k, v in dag.default_args.items():
+ if callable(v):
+ # Check we stored _someting_.
+ assert k in serialized_dag.default_args
+ else:
+ assert v == serialized_dag.default_args[k], f'default_args[{k}] matches'
Review comment:
```suggestion
assert v == serialized_dag.default_args[k], f'default_args[{k}] does not match'
```
?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org