You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/15 21:46:58 UTC

[GitHub] [airflow] kaxil opened a new pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

kaxil opened a new pull request #15395:
URL: https://github.com/apache/airflow/pull/15395


   This bug got introduced in #14909.
   
   The following DAG errors with: `TypeError: '<' not supported between instances of 'dict' and 'dict'`
   
   ```python
   from airflow import models
   from airflow.operators.dummy import DummyOperator
   from datetime import datetime, timedelta
   params = {
       "staging_schema": [{"key:":"foo","value":"bar"},
                          {"key:":"this","value":"that"}]
   }
   
   with models.DAG(dag_id='test-dag',
                   start_date=datetime(2019, 2, 14),
                   schedule_interval='30 13 * * *',
                   catchup=False,
                   max_active_runs=1,
                   params=params
                   ) as dag:
       my_task = DummyOperator(
           task_id='task1'
       )
   ```
   
   Full Error:
   
   ```
     File "/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py", line 210, in <dictcomp>
       return cls._encode({str(k): cls._serialize(v) for k, v in var.items()}, type_=DAT.DICT)
     File "/usr/local/lib/python3.7/site-packages/airflow/serialization/serialized_objects.py", line 212, in _serialize
       return sorted(cls._serialize(v) for v in var)
   TypeError: '<' not supported between instances of 'dict' and 'dict'
   During handling of the above exception, another exception occurred:
   ...
   ```
   
   This is because `sorted()` does not work with dict as it can't compare.
   It also fails when we have list/set/dict with multiple types
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterables

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #15395:
URL: https://github.com/apache/airflow/pull/15395


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#issuecomment-820756885


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614416666



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       Ah, iterable then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614416932



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       oh yeah




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614413116



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       This is passed much more than just an iterator, so this comment needs updating/expanding please.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614431374



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -240,10 +240,10 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
             return str(get_python_source(var))
         elif isinstance(var, set):
             # FIXME: casts set to list in customized serialization in future.
-            return cls._encode(sorted(cls._serialize(v) for v in var), type_=DAT.SET)
+            return cls._encode(cls._serialize_and_sort(var), type_=DAT.SET)
         elif isinstance(var, tuple):
             # FIXME: casts tuple to list in customized serialization in future.
-            return cls._encode(sorted(cls._serialize(v) for v in var), type_=DAT.TUPLE)
+            return cls._encode(cls._serialize_and_sort(var), type_=DAT.TUPLE)

Review comment:
       Fixed thanks, you are right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614414265



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       hmm 🤔  -- we only pass an iterator (one of list, set, tuple) no ? 
   
   Exmaple:
   
   ```python
   _serialize_and_sort([2, "x"])
   ```
   
   or
   
   ```python
   _serialize_and_sort((2, "x"))
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614418253



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       Updated in https://github.com/apache/airflow/pull/15395/commits/237af383107ac492a486770eb269f27c3769bc5d




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614418669



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -240,10 +240,10 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
             return str(get_python_source(var))
         elif isinstance(var, set):
             # FIXME: casts set to list in customized serialization in future.
-            return cls._encode(sorted(cls._serialize(v) for v in var), type_=DAT.SET)
+            return cls._encode(cls._serialize_and_sort(var), type_=DAT.SET)
         elif isinstance(var, tuple):
             # FIXME: casts tuple to list in customized serialization in future.
-            return cls._encode(sorted(cls._serialize(v) for v in var), type_=DAT.TUPLE)
+            return cls._encode(cls._serialize_and_sort(var), type_=DAT.TUPLE)

Review comment:
       Wait -- isn't this a bug -- we shouldn't be sorting Tuples or lists -- that means we've changed
   
   `[ 2, 0, 3 ]`
   
   in to 
   
   `[0, 2, 3]` which is different.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614414265



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       hmm 🤔  -- we only pass an iterator no ? 
   
   Exmaple:
   
   ```python
   _serialize_and_sort([2, "x"])
   ```
   
   or
   
   ```python
   _serialize_and_sort((2, "x"))
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #15395: Bugfix: ``TypeError`` when Serializing & sorting iterators

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #15395:
URL: https://github.com/apache/airflow/pull/15395#discussion_r614416975



##########
File path: airflow/serialization/serialized_objects.py
##########
@@ -252,6 +252,14 @@ def _serialize(cls, var: Any) -> Any:  # Unfortunately there is no support for r
 
     # pylint: enable=too-many-return-statements
 
+    @classmethod
+    def _serialize_and_sort(cls, var):
+        """Serialized and Sort the values in the iterator"""

Review comment:
       https://docs.python.org/3/library/stdtypes.html#iterator-types <-- you didn't mean one of these.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org