You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/08 18:37:10 UTC

[GitHub] [airflow] thsubramani opened a new pull request #21431: [scheduler HA]catch sqlalchemy leaks LockNotAvailable Exception in scheduler to have multiple scheduler running

thsubramani opened a new pull request #21431:
URL: https://github.com/apache/airflow/pull/21431


   
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   catch sqlalchemy leaks LockNotAvailable Exception in scheduler to have multiple scheduler running
   
   Issues details:
   
   https://github.com/apache/airflow/issues/20695
   
   Discussion detials:
   
   https://github.com/apache/airflow/discussions/21038
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1034539676


   IIRC SQLALchemy does not wrap OperationalError for bare `execute` calls. Not sure. Where is `LockNotAvailable` raised from?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1040241896


   Hey @thsubramani - are you rebasing/applying the comments soon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#discussion_r802094278



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -866,10 +866,26 @@ def _do_scheduling(self, session) -> int:
                 # Make sure we only sent this metric if we obtained the lock, otherwise we'll skew the
                 # metric, way down
                 timer.stop(send=True)
-            except OperationalError as e:
+            except (OperationalError, LockNotAvailable) as e:

Review comment:
       Thirdly: you aren't importing this from anywhere.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1033112381


   > Are you sure this is actually how the error is thrown? In my testing, the psycopg2.LockNotAvailable exception is caught by SQLAlchemy, and wrapped in an OperationalError.
   
   Yeah. This is still a surprise for me why it has not been converted to the OperationalError.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1034841394


   That could indeed explain it @uranusjr :
   
   It was this part of the code. It was raised in the `Pool.slots_stats(lock_rows=True` but it is just "after" the advisory lock execute that looks like might be the root cause (though - at least in theory) the lock SHOULD be acquired in this case. . 
   
   ```
           if session.get_bind().dialect.name == "postgresql":
               # Optimization: to avoid littering the DB errors of "ERROR: canceling statement due to lock
               # timeout", try to take out a transactional advisory lock (unlocks automatically on
               # COMMIT/ROLLBACK)
               lock_acquired = session.execute(
                   text("SELECT pg_try_advisory_xact_lock(:id)").bindparams(
                       id=DBLocks.SCHEDULER_CRITICAL_SECTION.value
                   )
               ).scalar()
               if not lock_acquired:
                   # Throw an error like the one that would happen with NOWAIT
                   raise OperationalError(
                       "Failed to acquire advisory lock", params=None, orig=RuntimeError('55P03')
                   )
   
           # Get the pool settings. We get a lock on the pool rows, treating this as a "critical section"
           # Throws an exception if lock cannot be obtained, rather than blocking
           pools = models.Pool.slots_stats(lock_rows=True, session=session)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#discussion_r803604482



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -866,10 +866,26 @@ def _do_scheduling(self, session) -> int:
                 # Make sure we only sent this metric if we obtained the lock, otherwise we'll skew the
                 # metric, way down
                 timer.stop(send=True)
-            except OperationalError as e:
+            except (OperationalError, LockNotAvailable) as e:

Review comment:
       Yeah 'try/except ImportError` should be nicely handled here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani commented on a change in pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
thsubramani commented on a change in pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#discussion_r802960503



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -866,10 +866,26 @@ def _do_scheduling(self, session) -> int:
                 # Make sure we only sent this metric if we obtained the lock, otherwise we'll skew the
                 # metric, way down
                 timer.stop(send=True)
-            except OperationalError as e:
+            except (OperationalError, LockNotAvailable) as e:

Review comment:
       will try to to use `is_lock_not_available_error` and push the changes 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani commented on a change in pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
thsubramani commented on a change in pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#discussion_r802960015



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -866,10 +866,26 @@ def _do_scheduling(self, session) -> int:
                 # Make sure we only sent this metric if we obtained the lock, otherwise we'll skew the
                 # metric, way down
                 timer.stop(send=True)
-            except OperationalError as e:
+            except (OperationalError, LockNotAvailable) as e:

Review comment:
       yes not sure why its not converting to OperationalError. but i have tested mutliple times and behaviour is same.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1034844471


   But y'now: `In theory, practice is the same as theory, but in practice it's not`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #21431: [scheduler HA]catch sqlalchemy leaks LockNotAvailable Exception in scheduler to have multiple scheduler running

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1032937178


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, itโ€™s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better ๐Ÿš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1034841394


   That could indeed explain it @uranusjr :
   
   It was this part of the code. It was raised in the `Pool.slots_stats(lock_rows=True` but it is just "after" the advisory lock execute that looks like might be the root cause though - (at least in theory) the lock SHOULD be acquired in this case.
   
   ```
           if session.get_bind().dialect.name == "postgresql":
               # Optimization: to avoid littering the DB errors of "ERROR: canceling statement due to lock
               # timeout", try to take out a transactional advisory lock (unlocks automatically on
               # COMMIT/ROLLBACK)
               lock_acquired = session.execute(
                   text("SELECT pg_try_advisory_xact_lock(:id)").bindparams(
                       id=DBLocks.SCHEDULER_CRITICAL_SECTION.value
                   )
               ).scalar()
               if not lock_acquired:
                   # Throw an error like the one that would happen with NOWAIT
                   raise OperationalError(
                       "Failed to acquire advisory lock", params=None, orig=RuntimeError('55P03')
                   )
   
           # Get the pool settings. We get a lock on the pool rows, treating this as a "critical section"
           # Throws an exception if lock cannot be obtained, rather than blocking
           pools = models.Pool.slots_stats(lock_rows=True, session=session)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani edited a comment on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
thsubramani edited a comment on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1047257096


   > Hey @thsubramani - are you rebasing/applying the comments soon ?
   
   yes sure, sorry was in vacation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1034841394


   That could indeed explain it @uranusjr :
   
   It was this part of the code. It was raised in the `Pool.slots_stats(lock_rows=True` but it is just "after" the advisory lock execute that looks like might be the root cause. 
   
   ```
           if session.get_bind().dialect.name == "postgresql":
               # Optimization: to avoid littering the DB errors of "ERROR: canceling statement due to lock
               # timeout", try to take out a transactional advisory lock (unlocks automatically on
               # COMMIT/ROLLBACK)
               lock_acquired = session.execute(
                   text("SELECT pg_try_advisory_xact_lock(:id)").bindparams(
                       id=DBLocks.SCHEDULER_CRITICAL_SECTION.value
                   )
               ).scalar()
               if not lock_acquired:
                   # Throw an error like the one that would happen with NOWAIT
                   raise OperationalError(
                       "Failed to acquire advisory lock", params=None, orig=RuntimeError('55P03')
                   )
   
           # Get the pool settings. We get a lock on the pool rows, treating this as a "critical section"
           # Throws an exception if lock cannot be obtained, rather than blocking
           pools = models.Pool.slots_stats(lock_rows=True, session=session)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#discussion_r803274098



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -866,10 +866,26 @@ def _do_scheduling(self, session) -> int:
                 # Make sure we only sent this metric if we obtained the lock, otherwise we'll skew the
                 # metric, way down
                 timer.stop(send=True)
-            except OperationalError as e:
+            except (OperationalError, LockNotAvailable) as e:

Review comment:
       We also need to run without psycopg2 installed (i.e. if the user is on mysql then psycopg2 won't be around to import from)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1040241896


   Hey @thsubramani - are you rebasing/applying the comments soon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani commented on pull request #21431: [scheduler HA]catch sqlalchemy leaking LockNotAvailable Exception during multiple scheduler instances comes up

Posted by GitBox <gi...@apache.org>.
thsubramani commented on pull request #21431:
URL: https://github.com/apache/airflow/pull/21431#issuecomment-1047257096


   > Hey @thsubramani - are you rebasing/applying the comments soon ?
   
   yes sure, sorry was in vocation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org