You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "xuganyu96 (via GitHub)" <gi...@apache.org> on 2023/06/10 23:03:42 UTC

[GitHub] [airflow] xuganyu96 opened a new pull request, #31836: Add --retry and --retry-delay to "airflow db check"

xuganyu96 opened a new pull request, #31836:
URL: https://github.com/apache/airflow/pull/31836

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #31818
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   This PR closes #31818. It is still a work in progress with many to-do's pending:
   
   - [ ] Adding unit tests if necessary
   - [ ] Run through unit tests and CI/CD
   - [ ] Updating the documentation, particularly CHANGELOG and Airflow's website
   
   This PR made significant refactor to `airflow.utils.db.check` function, which I felt safe to do because I found this function to only be used by the `db check` command, and which I felt it not an API used externally. Please let me know if this approach is not satisfactory.
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1227687809


##########
airflow/cli/commands/db_command.py:
##########
@@ -187,9 +188,21 @@ def shell(args):
 
 
 @cli_utils.action_cli(check_db=False)
-def check(_):
+def check(args):
     """Runs a check command that checks if db is available."""
-    db.check()
+    retries: int = args.retry

Review Comment:
   yeah. using tenacity (especially that we use it, is a good idea here). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] xuganyu96 commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "xuganyu96 (via GitHub)" <gi...@apache.org>.
xuganyu96 commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1228454463


##########
airflow/cli/commands/db_command.py:
##########
@@ -187,9 +188,21 @@ def shell(args):
 
 
 @cli_utils.action_cli(check_db=False)
-def check(_):
+def check(args):
     """Runs a check command that checks if db is available."""
-    db.check()
+    retries: int = args.retry

Review Comment:
   Thank you for the suggestion. Will work on incorporating `tenacity` right away.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] utkarsharma2 commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "utkarsharma2 (via GitHub)" <gi...@apache.org>.
utkarsharma2 commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1228841822


##########
airflow/cli/commands/db_command.py:
##########
@@ -187,9 +188,18 @@ def shell(args):
 
 
 @cli_utils.action_cli(check_db=False)
-def check(_):
+def check(args):
     """Runs a check command that checks if db is available."""
-    db.check()
+    retries: int = args.retry
+    retry_delay: int = args.retry_delay
+
+    for attempt in Retrying(

Review Comment:
   Yup, there is no need to test for tenacity code. 
   
   Also, reraising the original exception is vital as it gives users a hint of what needs to be done next, whereas RetryError would be meaningless to them. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #31836:
URL: https://github.com/apache/airflow/pull/31836#issuecomment-1592639263

   Awesome work, congrats on your first merged pull request! You are invited to check our [Issue Tracker](https://github.com/apache/airflow/issues) for additional contributions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk merged PR #31836:
URL: https://github.com/apache/airflow/pull/31836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] utkarsharma2 commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "utkarsharma2 (via GitHub)" <gi...@apache.org>.
utkarsharma2 commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1227637301


##########
airflow/cli/cli_config.py:
##########
@@ -500,6 +500,18 @@ def string_lower_type(val):
     help="Drop the archive tables after exporting. Use with caution.",
     action="store_true",
 )
+ARG_DB_RETRY = Arg(
+    ("--retry",),
+    default=0,
+    type=positive_int(allow_zero=True),
+    help="Retry database check upon failure",
+)
+ARG_DB_RETRY_DELAY = Arg(
+    ("--retry-delay",),
+    default=1,

Review Comment:
   I think we should give the database more time to breathe. :) 
   
   Would it make sense to increase the default value to 30 or something more?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] xuganyu96 commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "xuganyu96 (via GitHub)" <gi...@apache.org>.
xuganyu96 commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1228518699


##########
airflow/cli/commands/db_command.py:
##########
@@ -187,9 +188,18 @@ def shell(args):
 
 
 @cli_utils.action_cli(check_db=False)
-def check(_):
+def check(args):
     """Runs a check command that checks if db is available."""
-    db.check()
+    retries: int = args.retry
+    retry_delay: int = args.retry_delay
+
+    for attempt in Retrying(

Review Comment:
   @potiuk @utkarsharma2 
   Since `tenacity.retry` relies on unhandled exceptions for indicating whether a function call (or a code block in this case) is successful or not, I've reverted the implementation of `airflow.utils.db.check` back to its original state (letting the SQLAlchemy session raise unhandled OperationalError or else). Consequently, there is no longer any need to unit test this util function.
   
   I chose to set `reraise=True` so that the original error from `airflow.utils.db.check` can be surfaced. Please advise whether we should raise `RetryError` or the original error. Thank you.
   
   Last but not least, the unit test for `db_command.check` is updated accordingly. I am not 100% sure if testing the retry logic implemented in `tenacity.retry` is all that helpful, but the test itself still mocks `time.sleep` and `utils.db.check` and test that they are called for the correct number of times.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1227687084


##########
airflow/cli/cli_config.py:
##########
@@ -500,6 +500,18 @@ def string_lower_type(val):
     help="Drop the archive tables after exporting. Use with caution.",
     action="store_true",
 )
+ARG_DB_RETRY = Arg(
+    ("--retry",),
+    default=0,
+    type=positive_int(allow_zero=True),
+    help="Retry database check upon failure",
+)
+ARG_DB_RETRY_DELAY = Arg(
+    ("--retry-delay",),
+    default=1,

Review Comment:
   The db check retry here is really targetted to see if the database is "up" already - nothing more.
   
   I think 1s is quite good as a default. We do not do a LOT - we just try to connect to the database which takes milliseconds usually.  And if it does not work, then it usually means that the database is not yet ready and we will simply fail on failed "connection" establishing - nothing that makes it "heavy". 
   
   The longer the delay is, the longer we will delay startup if this command is used as a prerequisite to start airflow component (and this is the intention).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] utkarsharma2 commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "utkarsharma2 (via GitHub)" <gi...@apache.org>.
utkarsharma2 commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1227772564


##########
airflow/cli/cli_config.py:
##########
@@ -500,6 +500,18 @@ def string_lower_type(val):
     help="Drop the archive tables after exporting. Use with caution.",
     action="store_true",
 )
+ARG_DB_RETRY = Arg(
+    ("--retry",),
+    default=0,
+    type=positive_int(allow_zero=True),
+    help="Retry database check upon failure",
+)
+ARG_DB_RETRY_DELAY = Arg(
+    ("--retry-delay",),
+    default=1,

Review Comment:
   Sure, that makes sense. :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] utkarsharma2 commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "utkarsharma2 (via GitHub)" <gi...@apache.org>.
utkarsharma2 commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1227648936


##########
airflow/cli/commands/db_command.py:
##########
@@ -187,9 +188,21 @@ def shell(args):
 
 
 @cli_utils.action_cli(check_db=False)
-def check(_):
+def check(args):
     """Runs a check command that checks if db is available."""
-    db.check()
+    retries: int = args.retry

Review Comment:
   You can also consider using - https://tenacity.readthedocs.io/en/latest/ for retries it gives you many options 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on code in PR #31836:
URL: https://github.com/apache/airflow/pull/31836#discussion_r1230650328


##########
airflow/cli/commands/db_command.py:
##########
@@ -187,9 +188,18 @@ def shell(args):
 
 
 @cli_utils.action_cli(check_db=False)
-def check(_):
+def check(args):
     """Runs a check command that checks if db is available."""
-    db.check()
+    retries: int = args.retry
+    retry_delay: int = args.retry_delay
+
+    for attempt in Retrying(

Review Comment:
   Yes. That's the nice thing about using small and proven utils. No need to test them :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #31836: Add --retry and --retry-delay to "airflow db check"

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #31836:
URL: https://github.com/apache/airflow/pull/31836#issuecomment-1592639575

   :tada: @xuganyu96 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org