You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/22 08:57:24 UTC

[GitHub] [airflow] Jorricks commented on a change in pull request #16554: Feature: add serialisation support for custom DAGs and BaseOperators

Jorricks commented on a change in pull request #16554:
URL: https://github.com/apache/airflow/pull/16554#discussion_r656018821



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -186,13 +186,18 @@ def serialize_to_json(
             if cls._is_excluded(value, key, object_to_serialize):
                 continue
 
-            if key in decorated_fields:
-                serialized_object[key] = cls._serialize(value)
-            else:
-                value = cls._serialize(value)
-                if isinstance(value, dict) and "__type" in value:
+            value = cls._serialize(value)
+            if key not in decorated_fields and isinstance(value, dict) and "__type" in value:
+                # Extra check to make sure for custom operators or dags that the non-standard
+                # serialized_fields still support non-primitive types (e.g. dicts).
+                if isinstance(object_to_serialize, DAG) and key in DAG.get_serialized_fields():
+                    value = value["__var"]
+                elif (
+                    isinstance(object_to_serialize, BaseOperator)
+                    and key in BaseOperator.get_serialized_fields()
+                ):

Review comment:
       It's a bit of a pickle. I agree. Let me explain it with an example.
   I copied this example from the test case.
   ```python
           class MyOperator(BaseOperator):
               __serialized_fields: Optional[frozenset] = None
   
               def __init__(self, a_dictionary_value: Dict, **kwargs):
                   super().__init__(**kwargs)
                   self.a_dictionary_value = a_dictionary_value
   
               @classmethod
               def get_serialized_fields(cls):
                   if not cls.__serialized_fields:
                       custom_fields = {"a_dictionary_value"}
                       cls.__serialized_fields = frozenset(super().get_serialized_fields() | custom_fields)
                   return cls.__serialized_fields
   ```
   In the case when we serialize this operator, and the extra field is a complex type, it will fail during deserialization because of the above clause. It is then trying to find a `__var` and `__type` field that is not there anymore. 
   Now the proposal here, is that anything that is out-side the standard functionality of the BaseOperator or DAG, is serialized in the full way with a `__var` and `__type` dictionary.
   
   Thereby, if we would call `object_to_serialize.get_serialized_fileds()` on this object, `a_dictionary_value` would again be passed as the same value (just a dict without `__var` and `__type`) which would give an exception once we deserialize.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org