You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/09 10:03:05 UTC

[GitHub] [airflow] LionelZhao28 opened a new pull request #17502: localize the dag_run

LionelZhao28 opened a new pull request #17502:
URL: https://github.com/apache/airflow/pull/17502


   Why I commit this change:  
     * When checking the `airflow` features, I found that the dag `run_id` is generated using the UTC time, but actually, it is not convenient for us to use the UTC time. 
     * For example, I am in Beijing now, I ran a DAG at 19:00, the default `run_id` is `manual__2021-09-08T11:01:02.022226+08:00`, but after a few hours, I want to find this run, I have to parse the local time to UTC time to get the specific dag_run.
   
   How do I change:
     * Added a config `localize_dag_run_id` in the `core` config and the default value is False. It takes no effect if the user doesn't care about the dag run_id format. If the users who just like me want to use the local time to generate the `run_id`, they can set it to `True`
   
   Result:
   
   ![image](https://user-images.githubusercontent.com/7628879/128689674-2e1ed279-c5fc-4ecb-be9a-b1704bb07073.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 edited a comment on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-900929938


   Commit the new changes.
   There are two new changes:
   1. remove `__DST` suffix in the `run_id` as the offset will change when changing the DST, 
   2. After discussing @ashb , use dag `timezone` instead of `default_timezone` make sense,  we have three ways to do this:
       1.  We use the `execution_date` timezone directly. But I found the `execution_date` is UTC already, and if we transform it before calling the method `DagRun.generate_run_id`, the changes will have a big risk because it may affect the task scheduler. So give it up
       2. change the method `DagRun.generate_run_id` from staticmethod to classmethod, and it leads that there must be a dag model before calling the method. So give it up
       3. add a parameter `dag_timezone` in the method `generate_run_id`. I scan the usage of this method in the project and we can get the dag_time in all the calling codes, and there are not too many actually. So I follow this way.
   
   Actually, I'm still not sure about the second change, we can continue the discussion about it here . 
   
   Thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895111608


   > I kind of feel maybe we should just use this behaviour all the time. The run ID is only for identifying the run, and using UTC has no advantage beside being easy to implement.
   
   I agree that the `run ID` is only for identifying the run, but for the user, they may want to locate the specific run to check the dag logs, especially when debugging or troubleshooting, they may run the dag serval times manually. If the dag `run ID` is generated using the local time, it is really convenient for them.
   It will not increase the compatibility for our system to allow the user to localize the `run_id`, we will be very happy to see this change. 
   I've talked with some friends who are using the airflow, actually, most of them change the sources codes here and re-compile the project to launch.
   
   > Either way though, you need some tests to ensure this works.
   I've tested the code changes, setting this value to `True`, `False`, and with this config missing. It works fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 edited a comment on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895670688


   > I think local_run_id might be misleading - if people will see different value in the UI and in the DB that might be quite confusing. Plus it adds unnecesary field in the DB. The run_id is used throughout the whole UI and It would be quite a change to use 'local' version all over the UI. And it will be even harder to test. I agree with @uranusjr that using local timezone settings always is a good idea, but it has to be thoroughly unit-tested.
   
   Agree.
   I've set this PR to WIP.
   I will add the unit test about it.
   Because we don't use the `daylight saving time` in China,  I will be very glad if you can help me to do this if possible. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690407650



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       good idea. It leads to a small change for the project.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690845281



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   I checked the codes, we can not use the `tzinfo` in the execution_date directly, because it has been transformed into UTC already.
   ![image](https://user-images.githubusercontent.com/7628879/129824951-cf0ff5bd-916e-42af-82eb-1c9315afbeab.png)
   
   So we have two ways to implement:
   1. change the method generate_run_id from static method to class method
   2. add a parameter `timezone` in the method `generate_run_id` and the default value is settings.TIMEZONE.
   
   Which one do you prefer? Or there is something I miss that we can have a better choice?
   Let me know what do you think.
   Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690287118



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)
+            is_dst = local_time.is_dst()
+            if is_dst:
+                return f"{run_type}__{local_time.isoformat()}__DST"

Review comment:
       I don't see the need for this suffix -- when it's in DST the timezone offset will already be different (`+00:00` vs `+01:00__DST` in your example for UK)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690394802



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       it makes sense. 
   But we have to change the method from static_method to class method.
   I will change it tomorrow




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895098947


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895677915


   I want to know, in your local life, if the time is conflict, how do you distinguish them during the daylight saving time switch.
   For example, let us meet at 2:00 am, well, which 2:00 am?
   😓


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690286002



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       It should be based on the _dag_ timezone, not the default timezone.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690400175



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       Oh hmmm. It can't even be a class method. Maybe we should just pass the date in the "right" TZ here already? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690455602



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   I thought we can get the dag timezone by the `start_date` or `end_date`, but actually, the two values may be `None` if the user doesn't set them.
   So the question is that which `timezone` do you mean by `dag timezone`? 
   The timezone of `execution_date`?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-900929938


   Commit the new changes.
   There are two new changes:
   1. remove `__DST` suffix in the `run_id` as the offset will change when changing the DST, 
   2. After discussing @ashb , use dag `timezone` instead of `default_timezone` make sense,  we have three ways to do this:
       1. [X] We use the `execution_date` timezone directly. But I found the `execution_date` is UTC already, and if we transform it before calling the method `DagRun.generate_run_id`, the changes will have a big risk because it may affect the task scheduler,
       2. [X] change the method `DagRun.generate_run_id` from staticmethod to classmethod, and it leads that there must be a dag model before calling the method. 
       3. [V] add a parameter `dag_timezone` in the method `generate_run_id`. I scan the usage of this method in the project and we can get the dag_time in all the calling codes, and there are not too many actually. So I follow this way.
   
   Actually, I'm still not sure about the second change, we can continue the discussion about it here . 
   
   Thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r689626038



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)
+            is_dst = local_time.is_dst()
+            if is_dst:
+                return f"{run_type}__{local_time.isoformat()}__DST"
+            else:
+                return f"{run_type}__{local_time.isoformat()}"
+        else:
+            return f"{run_type}__{execution_date.isoformat()}"

Review comment:
       This can have unintended consequences -- we will have to look closely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690398707



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)
+            is_dst = local_time.is_dst()
+            if is_dst:
+                return f"{run_type}__{local_time.isoformat()}__DST"

Review comment:
       Actually, I don't know the living habits who are under DST, because we don't use the DST in China.
   I just think it will be much clearer if we add `__DST`.
   I will remove it tomorrow




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690407650



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       good idea. It will be a small change for the project.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895796041


   > whether the `run_id` is generated in DST
   
   This would be a bit difficult to determine unless the timezone implementation correctly implements [`dst`](https://docs.python.org/3/library/datetime.html#datetime.tzinfo.dst), or if the implementation uses a different timezone altogether. Maybe `pendulum` is robust enough to make this viable, maybe not, you’ll need to do some tests on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895124301


   @uranusjr wrota about adding unit tests. 
   
   And also there is one serious caveat (and one that should be covered by the unit tests) as well for sure is the DST change. UTC is not only easy to implement but guarantees conflict avoidance. With local time, if you have hourly scheduled DAG, you have a guaranteed conflict once a year - you will get the same run id conflict (when you move clock backwards). So unless you make sure that timezone is included and reflected properly, you might have a problem. 
   
   I think in this case, the time zone offset will be different when you use isoformat, but it definitely need testing (and what I mean by that unit testing) - so that we can avoid regressions and you actually have shown that you thought and tested that case consciously. 
   
   Also i think it would be great to have parameterized unit 
   test with a number of more or less probable and 'weird' timezones. There are certain timezones (with half or even quarter-hour shifts) so it would.be great to see those tested as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895793877


   > `:00:00+01:00`.
   
   If that, I think it will work fine if we can make sure the DST offset is in the `run_id`.
   Or, we can add another filed to show whether the `run_id` is generated in DST, just like "manual__2021-08-08T08:08:08.217033__dst",if it is not in `DST`, the run_id is `manual__2021-08-08T08:08:08.217033`.
   For the area that doesn't use `DST`, we can add another config to disable the `__dst` appending.
   Do you think it is ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690458047



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   Just confirm that the `dag timezone` is the value that is inited here? 
   https://github.com/apache/airflow/blob/e99624d2f17f0aadcb992827e9e721f5e6c978d8/airflow/models/dag.py#L326
   Can we use the `tzinfo` in the `execution_date` directly? 
   Or we need to add a parameter `dag_timezone` in the method `generate_run_id` and set the value when calling this method?
   Or transform the date to dag timezone datetime outside?

##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   I checked the codes, we can not use the `tzinfo` in the execution_date directly, because it has been transformed into UTC already.
   ![image](https://user-images.githubusercontent.com/7628879/129824951-cf0ff5bd-916e-42af-82eb-1c9315afbeab.png)
   
   So we have two ways to implement:
   1. change the method generate_run_id from static method to class method
   2. add a parameter `timezone` in the method `generate_run_id` and the default value is settings.TIMEZONE.
   
   Which one do you prefer? Or there is something I miss that we can have a better choice?
   Let me know what do you think.
   Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895192603


   > And also there is one serious caveat (and one that should be covered by the unit tests) as well for sure is the DST change. UTC is not only easy to implement but guarantees conflict avoidance. With local time, if you have hourly scheduled DAG, you have a guaranteed conflict once a year - you will get the same run id conflict (when you move clock backwards). So unless you make sure that timezone is included and reflected properly, you might have a problem.
   > 
   > I think in this case, the time zone offset will be different when you use isoformat, but it definitely needs testing (and what I mean by that unit testing) - so that we can avoid regressions and you actually have shown that you thought and tested that case consciously.
   > 
   > Also i think it would be great to have parameterized unit
   > test with a number of more or less probable and 'weird' timezones. There are certain timezones (with half or even quarter-hour shifts) so it would.be great to see those tested as well.
   
   I see, it makes sense that the run_id may be conflicting if the system timezone is changed. But actually, for our usage, we really need a local run_id to show to find the specific dag run. 
   But  I've got another solution that we can add another filed name `local_run_id`, it is transformed from UTC to local time only on the page, the `local time` is the same as the timezone that the user setting on the right top on the page. So we only need to change the js or transform view logic.
   Do you think it is much better?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-945308470


   give up.
   closing this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 removed a comment on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 removed a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-945308470


   give up.
   closing this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-945313036


   rebase to the latest main.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895124301


   @uranusjr wrota about adding unit tests. 
   
   And also there is one serious caveat (and one that should be covered by the unit tests as well for sure is the DST change. UTC is not only easy to implement but guarantees conflict avoidance. With local time, if you have hourly scheduled DAG, you have a guaranteed conflict once a year - you will get the same run id conflict (when you move clock backwards). So unless you make sure that timezone is included and reflected properly, you might have a problem. 
   
   I think in this case, the time zone offset will be different when you use isoformat, but it definitely need testing (and what I mean by that unit testing) - so that we can avoid regressions and you actually have shown that you thought and tested that case consciously. 
   
   Also i think it would be great to have parameterized unit 
   test with a number of more or less probable and 'weird' timezones. There are certain timezones (with half or even quarter-hour shifts) so it would.be great to see those tested as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-897303309


   @ashb @kaxil @XD-DENG  The PR is ready for review.   
   I think this change is quite useful for us who are not using UTC time.
   Please review it when you are free. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 edited a comment on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895111608


   > I kind of feel maybe we should just use this behaviour all the time. The run ID is only for identifying the run, and using UTC has no advantage beside being easy to implement.
   
   I agree that the `run ID` is only for identifying the run, but for the user, they may want to locate the specific run to check the dag logs, especially when debugging or troubleshooting, they may run the dag serval times manually. If the dag `run ID` is generated using the local time, it is really convenient for them.
   It will not increase the compatibility for our system to allow the user to localize the `run_id`, we will be very happy to see this change. 
   I've talked with some friends who are using the airflow, actually, most of them change the sources codes here and re-compile the project to launch.
   
   > Either way though, you need some tests to ensure this works.  
   
   I've tested the code changes, setting this value to `True`, `False`, and with this config missing. It works fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r689988346



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)
+            is_dst = local_time.is_dst()
+            if is_dst:
+                return f"{run_type}__{local_time.isoformat()}__DST"
+            else:
+                return f"{run_type}__{local_time.isoformat()}"
+        else:
+            return f"{run_type}__{execution_date.isoformat()}"

Review comment:
       Sure. Post the checks before I determine changing these
   
   As I checked the source codes, there may be two risks after changing these codes:
   1. As there is a UNIQUE_KEY(dag_id,run_id) in the database, we must make sure that the two values are unique. 
   1. If the user changes the server config `localize_dag_run_id` from `True` to `False` or they change the `default_timezone` frequently, the dag `run_id` may be conflict. 
   
   For the first risk issue, I use `pendulum` to make sure the local_times are unique during the DST changing. And I tried to did the full test and add the test codes into PR.
   For the second risk issue, I think it is only a low-rate case. Even the user changes the `default_timezone`, the dag `run_id` is accurate to six decimal places, it is really hard to make the `run_id` conflict.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895103646


   Either way though, you need some tests to ensure this works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 edited a comment on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895793877


   > I think the timezone part will change. For example, if you’re in England, 2am without DST (Greenwich Mean Time) would be 02:00:00+00:00, and 2am with DST (British Summer Time) would be 02:00:00+01:00.
   
   If that, I think it will work fine if we can make sure the DST offset is in the `run_id`.
   Or, we can add another filed to show whether the `run_id` is generated in DST, just like "manual__2021-08-08T08:08:08.217033__dst",if it is not in `DST`, the run_id is `manual__2021-08-08T08:08:08.217033`.
   For the area that doesn't use `DST`, we can add another config to disable the `__dst` appending.
   Do you think it is ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895789203


   I think the timezone part will change. For example, if you’re in England, 2am without DST (Greenwich Mean Time) would be `02:00:00+00:00`, and 2am with DST (British Summer Time) would be `02:00:00+01:00`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690407650



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       good idea. It will be a small change for the project.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-900929938


   Commit the new changes.
   There are two new changes:
   1. remove `__DST` suffix in the `run_id` as the offset will change when changing the DST, 
   2. After discussing @ashb , use dag `timezone` instead of `default_timezone` make sense,  we have three ways to do this:
       1. [X] We use the `execution_date` timezone directly. But I found the `execution_date` is UTC already, and if we transform it before calling the method `DagRun.generate_run_id`, the changes will have a big risk because it may affect the task scheduler,
       2. [X] change the method `DagRun.generate_run_id` from staticmethod to classmethod, and it leads that there must be a dag model before calling the method. 
       3. [V] add a parameter `dag_timezone` in the method `generate_run_id`. I scan the usage of this method in the project and we can get the dag_time in all the calling codes, and there are not too many actually. So I follow this way.
   
   Actually, I'm still not sure about the second change, we can continue the discussion about it here . 
   
   Thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895124301


   @uranusjr wrote about adding unit tests. 
   
   And also there is one serious caveat (and one that should be covered by the unit tests) as well for sure is the DST change. UTC is not only easy to implement but guarantees conflict avoidance. With local time, if you have hourly scheduled DAG, you have a guaranteed conflict once a year - you will get the same run id conflict (when you move clock backwards). So unless you make sure that timezone is included and reflected properly, you might have a problem. 
   
   I think in this case, the time zone offset will be different when you use isoformat, but it definitely need testing (and what I mean by that unit testing) - so that we can avoid regressions and you actually have shown that you thought and tested that case consciously. 
   
   Also i think it would be great to have parameterized unit 
   test with a number of more or less probable and 'weird' timezones. There are certain timezones (with half or even quarter-hour shifts) so it would.be great to see those tested as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 edited a comment on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-900929938


   Commit the new changes.
   There are two new changes:
   1. remove `__DST` suffix in the `run_id` as the offset will change when changing the DST, 
   2. After discussing @ashb , use dag `timezone` instead of `default_timezone` make sense,  we have three ways to do this:
       1.  We use the `execution_date` timezone directly. But I found the `execution_date` is UTC already, and if we transform it before calling the method `DagRun.generate_run_id`, the changes will have a big risk because it may affect the task scheduler. So give it up
       2. change the method `DagRun.generate_run_id` from staticmethod to classmethod, and it leads that there must be a dag model before calling the method. So give it up
       3. add a parameter `dag_timezone` in the method `generate_run_id`. I scan the usage of this method in the project and we can get the dag_time in all the calling codes, and there are not too many actually. So I follow this way.
   
   Actually, I'm still not sure about the second change, we can continue the discussion about it here . 
   
   Thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690458047



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       Just confirm that the `dag timezone` is the value that is inited here? 
   https://github.com/apache/airflow/blob/e99624d2f17f0aadcb992827e9e721f5e6c978d8/airflow/models/dag.py#L326




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895103249


   I kind of feel maybe we should just use this behaviour all the time. The run ID is only for identifying the run, and using UTC has no advantage beside being easy to implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895880253


   I think `pendulum` works fine when dealing with the DST time.
   I've changed the codes and add the unit test for the local run_id


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 closed pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 closed pull request #17502:
URL: https://github.com/apache/airflow/pull/17502


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690458047



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   Just confirm that the `dag timezone` is the value that is inited here? 
   https://github.com/apache/airflow/blob/e99624d2f17f0aadcb992827e9e721f5e6c978d8/airflow/models/dag.py#L326
   Can we use the `tzinfo` in the `execution_date` directly? 
   Or we need to add a parameter `dag_timezone` in the method `generate_run_id` and set the value when calling this method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895124301


   @uranusjr wrote about adding unit tests. 
   
   And also there is one serious caveat (and one that should be covered by the unit tests) as well for sure is the DST change. UTC is not only easy to implement but guarantees conflict avoidance. With local time, if you have hourly scheduled DAG, you have a guaranteed conflict once a year - you will get the same run id conflict (when you move clock backwards). So unless you make sure that timezone is included and reflected properly, you might have a problem. 
   
   I think in this case, the time zone offset will be different when you use isoformat, but it definitely needs testing (and what I mean by that unit testing) - so that we can avoid regressions and you actually have shown that you thought and tested that case consciously. 
   
   Also i think it would be great to have parameterized unit 
   test with a number of more or less probable and 'weird' timezones. There are certain timezones (with half or even quarter-hour shifts) so it would.be great to see those tested as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690458047



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   Just confirm that the `dag timezone` is the value that is inited here? 
   https://github.com/apache/airflow/blob/e99624d2f17f0aadcb992827e9e721f5e6c978d8/airflow/models/dag.py#L326
   Can we use the `tzinfo` in the `execution_date` directly? 
   Or we need to add a parameter `dag_timezone` in the method `generate_run_id` and set the value when calling this method?
   Or transform the date to dag timezone datetime outside?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690455602



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   I thought we can get the dag timezone by the `start_date` or `end_date`, but actually, the two values may be `None if the user doesn't set them.
   So the question is that which `timezone` do you mean? 
   The timezone of `execution_date`?  

##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       @ashb 
   I thought we can get the dag timezone by the `start_date` or `end_date`, but actually, the two values may be `None` if the user doesn't set them.
   So the question is that which `timezone` do you mean? 
   The timezone of `execution_date`?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690458047



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       Just confirm that the `dag timezone` is the value that is inited here? 
   https://github.com/apache/airflow/blob/e99624d2f17f0aadcb992827e9e721f5e6c978d8/airflow/models/dag.py#L326
   Can we use the `tzinfo` in the `execution_date` directly? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r689988346



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)
+            is_dst = local_time.is_dst()
+            if is_dst:
+                return f"{run_type}__{local_time.isoformat()}__DST"
+            else:
+                return f"{run_type}__{local_time.isoformat()}"
+        else:
+            return f"{run_type}__{execution_date.isoformat()}"

Review comment:
       Sure. Post the checks before I determine changing these
   
   As I checked the source codes, there may be two risks after changing these codes:
   1. As there is a UNIQUE_KEY(dag_id,run_id) in the database, we must make sure that the two values are unique. 
   1. If the user changes the server config `localize_dag_run_id` from `True` to `False` or they change the `default_timezone` frequently, the dag `run_id` may be conflict. 
   
   For the first risk issue, I use `pendulum` to make sure the local_times are unique during the DST changing. And I tried to did the full test and add the test codes into PR.
   For the second issue, I think it is only a low-rate case. Even the user changes the `default_timezone`, the dag `run_id` is accurate to six decimal places, it is really hard to make the `run_id` conflict.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895336634


   I think local_run_id might be misleading - if people will see different value in the UI and in the DB that might be quite confusing. Plus it adds unnecesary field in the DB. The run_id is used throughout the whole UI and It would be quite a change to use 'local' version all over the UI. And it will be even harder to test. I agree with @uranusjr that using local timezone settings always is a good idea, but it has to be thoroughly unit-tested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the dag_run

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-895670688


   > I think local_run_id might be misleading - if people will see different value in the UI and in the DB that might be quite confusing. Plus it adds unnecesary field in the DB. The run_id is used throughout the whole UI and It would be quite a change to use 'local' version all over the UI. And it will be even harder to test. I agree with @uranusjr that using local timezone settings always is a good idea, but it has to be thoroughly unit-tested.
   
   Agree.
   I've set this PR to WIP.
   I will add the unit test about it.
   Because we don't use the `winter time` and `summer time` in China,  I will be very glad if you can help me to do this if possible. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-907965112


   rebased to latest main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] LionelZhao28 commented on pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
LionelZhao28 commented on pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#issuecomment-896531519


   @uranusjr I think you may miss the PR comment message.
   I've added the unit test for local run_id, please review it when you are free.
   Thanks a lot 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #17502: localize the run_id in dagrun

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #17502:
URL: https://github.com/apache/airflow/pull/17502#discussion_r690286002



##########
File path: airflow/models/dagrun.py
##########
@@ -286,7 +287,17 @@ def find(
     @staticmethod
     def generate_run_id(run_type: DagRunType, execution_date: datetime) -> str:
         """Generate Run ID based on Run Type and Execution Date"""
-        return f"{run_type}__{execution_date.isoformat()}"
+        from airflow.configuration import conf
+        local_run_id = conf.getboolean("core", "localize_dag_run_id", fallback=False)
+        if local_run_id:
+            local_time = pendulum.instance(execution_date).astimezone(tz=settings.TIMEZONE)

Review comment:
       It should be based on the _dag_ timezone, not the local timezone.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org