You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/29 19:39:57 UTC

[GitHub] [airflow] Acehaidrey commented on a change in pull request #9544: Add metric for scheduling delay between first run task & expected start time

Acehaidrey commented on a change in pull request #9544:
URL: https://github.com/apache/airflow/pull/9544#discussion_r447208060



##########
File path: airflow/models/dagrun.py
##########
@@ -411,6 +412,44 @@ def _are_premature_tis(
                 return True
         return False
 
+    @provide_session
+    def _emit_true_scheduling_delay_stats_for_finished_state(self, session=None):
+        """
+        This is a helper method to emit the true scheduling delay stats, which is defined as
+        the time when the first task in DAG starts minus the expected DAG run datetime.
+        This method will be used in the update_state method when the state of the DagRun
+        is updated to a completed status (either success or failure). The method will find the first
+        started task within the DAG and calculate the expected DagRun start time (based on
+        dag.execution_date & dag.schedule_interval), and minus these two to get the delay.
+
+        The emitted data may contains outlier (e.g. when the first task was cleared, so
+        the second task's start_date will be used), but we can get ride of the the outliers
+        on the stats side through the dashboards.
+
+        Note, the stat will only be emitted if the DagRun is a scheduler triggered one
+        (i.e. external_trigger is False).
+        """
+        if self.state == State.RUNNING:
+            return
+
+        try:
+            if self.external_trigger:
+                return
+            # Get the task that has the earliest start_date
+            qry = session.query(TI).filter(
+                TI.dag_id == self.dag_id,
+                TI.execution_date == self.execution_date,
+                TI.start_date.isnot(None),
+            ).order_by(TI.start_date.asc())
+            ti = qry.first()
+            dag = self.get_dag()
+            if ti and dag:

Review comment:
       so dag.tasks doesn't have the start_date that it actually began with. That is saved in the db, not the start_date that is assigned in the default args. Because the intent is to get the real start time that this begins with, vs the dagrun start time




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org