You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/02/24 13:07:56 UTC

[GitHub] [airflow] ashb commented on a change in pull request #7489: [AIRFLOW-6869][WIP] Bulk fetch DAGRuns for _process_task_instances

ashb commented on a change in pull request #7489: [AIRFLOW-6869][WIP] Bulk fetch DAGRuns for _process_task_instances
URL: https://github.com/apache/airflow/pull/7489#discussion_r383253824
 
 

 ##########
 File path: airflow/jobs/scheduler_job.py
 ##########
 @@ -684,11 +687,36 @@ def _process_dags(self, dagbag, dags, tis_out):
         :type dagbag: airflow.models.DagBag
         :param dags: the DAGs from the DagBag to process
         :type dags: List[airflow.models.DAG]
-        :param tis_out: A list to add generated TaskInstance objects
-        :type tis_out: list[TaskInstance]
-        :rtype: None
+        :rtype: list[TaskInstance]
+        :return: A list of generated TaskInstance objects
         """
         check_slas = conf.getboolean('core', 'CHECK_SLAS', fallback=True)
+
+        tis_out = []
+        dag_ids = [dag.dag_id for dag in dags]
+        dag_runs = DagRun.find(dag_ids=dag_ids, state=State.RUNNING, session=session)
+        # list() is needed because of a bug in Python 3.7+
+        #
+        # The following code returns different values depending on the Python version
+        # from itertools import groupby
+        # from unittest.mock import MagicMock
+        # key = "key"
+        # item = MagicMock(attr=key)
+        # items = [item]
+        # items_by_attr = {k: v for k, v in groupby(items, lambda d: d.attr)}
+        # print("items_by_attr=", items_by_attr)
+        # item_with_key = list(items_by_attr[key]) if key in items_by_attr else []
+        # print("item_with_key=", item_with_key)
+        #
+        # Python 3.7+:
+        # items_by_attr= {'key': <itertools._grouper object at 0x7f3b9f38d4d0>}
+        # item_with_key= []
+        #
+        # Python 3.6:
+        # items_by_attr= {'key': <itertools._grouper object at 0x101128630>}
+        # item_with_key= [<MagicMock id='4310405416'>]
 
 Review comment:
   The behaviour is different on py3.6 and 3.7, but is still wrong on both when a more than a single item is in `items`: 3.6 would return the last item only.
   
   The [docs for groupby](https://docs.python.org/3/library/itertools.html#itertools.groupby) say:
   
   > Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.
   
   Since the behaviour without list is broken in otherways on 3.6 too I think we can just replace this comment with:
   
   ```
            # As per the docs of groupby (https://docs.python.org/3/library/itertools.html#itertools.groupby)
            # we need to use `list()` otherwise the result will be wrong/incomplete
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services