You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/25 09:07:21 UTC

[GitHub] [airflow] uranusjr opened a new pull request #19821: Sanity check for MySQL's TIMESTAMP column

uranusjr opened a new pull request #19821:
URL: https://github.com/apache/airflow/pull/19821


   When using a sensor in reschedule mode, the poke_interval is used to calculate a new date to run the task. However, user errors may easily cause the entire scheduler to crash if that new date goes over MySQL's TIMESTAMP limit, which causes the database to return NULL later.
   
   This patch adds some very rudimentary checks to avoid these user errors as soon as possible. The checks are far from bullet-proof; to be actually resillient, Airflow should really migrate away from using the TIMESTAMP field altogether, which is a big but important topic on its own and needs to be handled in the next ten years.
   
   Close #19801.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#issuecomment-979174521


   Huh why is `TestPythonVirtualenvOperator::test_airflow_context` suddenly failing.
   
   Edit: Fix proposed in #19830.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#issuecomment-984627861


   Probably, but that’s out of scope of this PR. If code fails at that site, the invalid value already got into the database, which is probably too late already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#issuecomment-979174521


   Huh why is `TestPythonVirtualenvOperator::test_airflow_context` suddenly failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#issuecomment-979139993


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] manuzhang commented on pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
manuzhang commented on pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#issuecomment-984407415


   Do we need to protect `next_reschedule_date` at https://github.com/apache/airflow/blob/main/airflow/ti_deps/deps/ready_to_reschedule.py#L66 in case of any unknown errors?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#discussion_r756862089



##########
File path: airflow/sensors/base.py
##########
@@ -41,6 +42,13 @@
 # See  https://github.com/apache/airflow/issues/16035
 from airflow.utils.decorators import apply_defaults
 
+# As documented in https://dev.mysql.com/doc/refman/5.7/en/datetime.html.
+_MYSQL_TIMESTAMP_MAX = datetime.datetime(2038, 1, 19, 3, 14, 7, tzinfo=timezone.utc)
+
+
+def _is_metadatabase_mysql() -> bool:
+    return settings.engine and settings.engine.url.get_backend_name() == "mysql"

Review comment:
       Cache this? Move it in to settings.py?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ferruzzi commented on a change in pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on a change in pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#discussion_r760497630



##########
File path: tests/sensors/test_base.py
##########
@@ -498,6 +498,28 @@ def run_duration():
             assert interval2 >= sensor.poke_interval
             assert interval2 > interval1
 
+    @pytest.mark.backend("mysql")
+    def test_reschedule_poke_interval_too_long_on_mysql(self, make_sensor):
+        with pytest.raises(AirflowException) as ctx:
+            make_sensor(poke_interval=863998946, mode="reschedule", return_value="irrelevant")

Review comment:
       I love the immortalized poke_interval.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr merged pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
uranusjr merged pull request #19821:
URL: https://github.com/apache/airflow/pull/19821


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#discussion_r757105118



##########
File path: airflow/sensors/base.py
##########
@@ -41,6 +42,13 @@
 # See  https://github.com/apache/airflow/issues/16035
 from airflow.utils.decorators import apply_defaults
 
+# As documented in https://dev.mysql.com/doc/refman/5.7/en/datetime.html.
+_MYSQL_TIMESTAMP_MAX = datetime.datetime(2038, 1, 19, 3, 14, 7, tzinfo=timezone.utc)
+
+
+def _is_metadatabase_mysql() -> bool:
+    return settings.engine and settings.engine.url.get_backend_name() == "mysql"

Review comment:
       Adding cache. It doesn’t seem to worth putting in `settings.py` for me, unless we can find more use cases for the function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #19821: Sanity check for MySQL's TIMESTAMP column

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #19821:
URL: https://github.com/apache/airflow/pull/19821#issuecomment-979977943


   Can you also rebase on top of main so your fix in https://github.com/apache/airflow/pull/19830 fixes the tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org