You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/01 08:29:51 UTC

[GitHub] [airflow] wanlce opened a new pull request, #22658: Fix incorrect data_interval_start due to scheduling time change

wanlce opened a new pull request, #22658:
URL: https://github.com/apache/airflow/pull/22658

   This PR fixes the incorrect first run of data_interval_start after
   changing the scheduling time.
   
   * Added _align_to_prev function for scheduling alignment after time change.
   
   * renamed _align to _align_to_next.
   
   related: #21715 
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #21715 
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #22658:
URL: https://github.com/apache/airflow/pull/22658#issuecomment-1207792295

   I think  what woudl be great here is to add a few more tests showing the different cases  and maybe explaining the context in the test a bit better?
   
   The case here is either not tested or it's not clear that it is tested I think: 
   
   ```
       def _align_to_prev(self, current: DateTime) -> DateTime:
           """Get the prev scheduled time.
           This is ``current - interval``, unless ``current`` falls right on the
           interval boundary, when ``current`` is returned.
           """
           prev_time = self._get_prev(current)
           if self._get_next(prev_time) != current:
               return prev_time
           return current
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #22658:
URL: https://github.com/apache/airflow/pull/22658#discussion_r848046903


##########
airflow/timetables/interval.py:
##########
@@ -58,6 +58,21 @@ def _align(self, current: DateTime) -> DateTime:
         """
         raise NotImplementedError()
 
+    def _align_to_prev(self, current: DateTime) -> DateTime:
+        """Align given time to the scheduled.
+
+        For fixed schedules (e.g. every midnight); this finds the prev time that
+        aligns to the declared time, if the given time does not align. If the
+        schedule is not fixed (e.g. every hour), the given time is returned.
+
+        It is not enough to just have _align_to_next. When a DAG's schedule
+        changed, _algin_to_next does not correct the scheduling time in time,
+        resulting in the first scheduling after the scheduling time has been
+        changed remaining the same. So we would need to align forward to ensure
+        that it works correctly after a scheduling time change.

Review Comment:
   ```suggestion
           It is not enough to use ``_get_prev(_align_to_next())`` instead. When a
           DAG's schedule changes, this alternative would make the first scheduling
           after the schedule change remain the same. We need to align forward to
           ensure it works correctly in this situation.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #22658:
URL: https://github.com/apache/airflow/pull/22658#issuecomment-1207705286

   Anyone else wants to take a look at this one? It’s a relatively straightforward change, since we established that this additional method is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] wanlce commented on a diff in pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
wanlce commented on code in PR #22658:
URL: https://github.com/apache/airflow/pull/22658#discussion_r848032037


##########
airflow/timetables/interval.py:
##########
@@ -76,21 +80,23 @@ def next_dagrun_info(
         if not restriction.catchup:
             earliest = self._skip_to_latest(earliest)
         elif earliest is not None:
-            earliest = self._align(earliest)
+            earliest = self._align_to_next(earliest)
         if last_automated_data_interval is None:
             # First run; schedule the run at the first available time matching
             # the schedule, and retrospectively create a data interval for it.
             if earliest is None:
                 return None
             start = earliest
-        else:  # There's a previous run.
+        else:  # There's a previous run

Review Comment:
   Hello



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
uranusjr commented on PR #22658:
URL: https://github.com/apache/airflow/pull/22658#issuecomment-1209160661

   Actually this is also the cause of #23689. I’ll add a few test cases here to illustrate those isues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] wanlce closed pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
wanlce closed pull request #22658: Fix incorrect data_interval_start due to scheduling time change
URL: https://github.com/apache/airflow/pull/22658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #22658: Fix incorrect data_interval_start due to scheduling time change

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22658:
URL: https://github.com/apache/airflow/pull/22658#issuecomment-1096201424

   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ashb merged pull request #22658: Fix incorrect data interval alignment due to assumption on input time alignment

Posted by GitBox <gi...@apache.org>.
ashb merged PR #22658:
URL: https://github.com/apache/airflow/pull/22658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org