You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/05 13:09:26 UTC

[GitHub] [airflow] ashb opened a new pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

ashb opened a new pull request #12835:
URL: https://github.com/apache/airflow/pull/12835


   Dags with a schedule interval of None, or `@once` don't have a following
   schedule, so we can't realistically calculate this metric.
   
   Additionally, this changes the emitted metric from seconds to
   milliseconds -- all timers to statsd should be in milliseconds -- this
   is what Statsd and apps that consume data from there expect. See #10629
   for more details.
   
   This will be a "breaking" change from 1.10.14, where the metric was
   back-ported to, but was (incorrectly) emitting seconds.
   
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb merged pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

Posted by GitBox <gi...@apache.org>.
ashb merged pull request #12835:
URL: https://github.com/apache/airflow/pull/12835


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Acehaidrey commented on a change in pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

Posted by GitBox <gi...@apache.org>.
Acehaidrey commented on a change in pull request #12835:
URL: https://github.com/apache/airflow/pull/12835#discussion_r537266508



##########
File path: airflow/models/dagrun.py
##########
@@ -573,23 +573,29 @@ def _emit_true_scheduling_delay_stats_for_finished_state(self, finished_tis):
         Note, the stat will only be emitted if the DagRun is a scheduler triggered one
         (i.e. external_trigger is False).
         """
+        if self.state == State.RUNNING:
+            return
+        if self.external_trigger:
+            return
+        if not finished_tis:
+            return
+
         try:
-            if self.state == State.RUNNING:
-                return
-            if self.external_trigger:
-                return
-            if not finished_tis:
-                return
             dag = self.get_dag()
+
+            if not self.dag.schedule_interval or self.dag.schedule_interval == "@once":
+                # We can't emit this metric if there is no following schedule to cacluate from!
+                return
+
             ordered_tis_by_start_date = [ti for ti in finished_tis if ti.start_date]
             ordered_tis_by_start_date.sort(key=lambda ti: ti.start_date, reverse=False)
             first_start_date = ordered_tis_by_start_date[0].start_date

Review comment:
       hey @XD-DENG yes agree you can do that and simplify this :) Feel free to add that refactor!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #12835:
URL: https://github.com/apache/airflow/pull/12835#issuecomment-739249053


   Ping @Acehaidrey -- A bug (that thankfully due to the try except wasn't fatal), and also changing the "unit" from seconds to milliseconds.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Acehaidrey commented on pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

Posted by GitBox <gi...@apache.org>.
Acehaidrey commented on pull request #12835:
URL: https://github.com/apache/airflow/pull/12835#issuecomment-739707529


   thank you team for adding this check and fixing my issue! sorry. for misssing this check.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #12835:
URL: https://github.com/apache/airflow/pull/12835#discussion_r536811076



##########
File path: airflow/models/dagrun.py
##########
@@ -573,23 +573,29 @@ def _emit_true_scheduling_delay_stats_for_finished_state(self, finished_tis):
         Note, the stat will only be emitted if the DagRun is a scheduler triggered one
         (i.e. external_trigger is False).
         """
+        if self.state == State.RUNNING:
+            return
+        if self.external_trigger:
+            return
+        if not finished_tis:
+            return
+
         try:
-            if self.state == State.RUNNING:
-                return
-            if self.external_trigger:
-                return
-            if not finished_tis:
-                return
             dag = self.get_dag()
+
+            if not self.dag.schedule_interval or self.dag.schedule_interval == "@once":
+                # We can't emit this metric if there is no following schedule to cacluate from!
+                return
+
             ordered_tis_by_start_date = [ti for ti in finished_tis if ti.start_date]
             ordered_tis_by_start_date.sort(key=lambda ti: ti.start_date, reverse=False)
             first_start_date = ordered_tis_by_start_date[0].start_date

Review comment:
       A question not relating to the changes made in this PR: why not we directly have something like
   
   `first_start_date = min(ti.start_date for ti in finished_tis)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12835: Don't emit first_task_scheduling_delay metric for only-once dags

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12835:
URL: https://github.com/apache/airflow/pull/12835#issuecomment-739255153


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org