You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/21 08:45:58 UTC

[GitHub] [airflow] sorabhgit opened a new issue #17127: Airflow 2.1.0 with Schedulers HA Failing

sorabhgit opened a new issue #17127:
URL: https://github.com/apache/airflow/issues/17127


   ### Discussed in https://github.com/apache/airflow/discussions/17126
   
   <div type='discussions-op-text'>
   
   <sup>Originally posted by **sorabhgit** July 21, 2021</sup>
   Hello Guys , I am also struggling with issue while setting up schedulers HA with Airflow 2.1.0 version .
   
   I've installed airflow scheduler on 2 separate nodes with both pointing to same mysql8 but gets below error in one of the airflow scheduler logs :
   
   Steps to reproduce :
   1. Install Airflow 2.1.0 on 2 nodes using Mysql 8.0.25 . 
   2. use_row_level_locking = True ( in airflow.cfg of both the nodes )
   2. Start scheduler,webserver,celery worker on node1 and just scheduler on node2 .
   3. Execute any example DAG and one of the scheduler will exit/failed with below error .
   
   `[^[[34m2021-07-01 08:15:04,342^[[0m] {^[[34mscheduler_job.py:^[[0m1302} ERROR^[[0m - Exception when executing SchedulerJob._run_scheduler_loop^[[0m
   Traceback (most recent call last):
   File "/usr/local/lib64/python3.6/site-packages/mysql/connector/connection_cext.py", line 337, in get_rows
   else self._cmysql.fetch_row()
   _mysql_connector.MySQLInterfaceError: Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
   cursor, statement, parameters, context
   File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
   cursor.execute(statement, parameters)
   File "/usr/local/lib64/python3.6/site-packages/mysql/connector/cursor_cext.py", line 277, in execute
   self._handle_result(result)
   File "/usr/local/lib64/python3.6/site-packages/mysql/connector/cursor_cext.py", line 172, in _handle_result
   self._handle_resultset()
   File "/usr/local/lib64/python3.6/site-packages/mysql/connector/cursor_cext.py", line 671, in _handle_resultset
   self._rows = self._cnx.get_rows()[0]
   File "/usr/local/lib64/python3.6/site-packages/mysql/connector/connection_cext.py", line 368, in get_rows
   sqlstate=exc.sqlstate)
   mysql.connector.errors.DatabaseError: 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
   File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1284, in _execute
   num_queued_tis = self._do_scheduling(session)
   File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1546, in _do_scheduling
   num_queued_tis = self._critical_section_execute_task_instances(session=session)
   File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1142, in _critical_section_execute_task_instances
   return func(*args, **kwargs)
   File "/usr/local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 900, in _executable_task_instances_to_queued
   pools = models.Pool.slots_stats(lock_rows=True, session=session)
   
   </div>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
thsubramani commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-1017837253


   @potiuk any suggesstions would be really appreciated . Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
thsubramani commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-1006085341


   @sorabhgit did you try with postgres DB. any luck with that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884831959


   Just a note - discussion here https://github.com/apache/airflow/discussions/14788  indicates that the problem might be that you want to connect airflow to a DB in Active/Active HA mode. This is NOT supported. The database must be in Active/Passive HA mode. You either have to have your DB in Active-Passive mode, or explicitly configure Airflow to talk to only one DB server from the cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884780351


   @namjals Yes , able to perform the select and insert on the mysql db . 
   
   `>>> # Get a cursor
   ... cur = cnx.cursor()
   >>>
   >>> # Execute a query
   ... cur.execute("SELECT CURDATE()")
   >>>
   >>> # Fetch one result
   ... row = cur.fetchone()
   >>> print("Current date is: {0}".format(row[0]))
   Current date is: 2021-07-22
   >>>
   >>> # Close connection
   ... cnx.close()
   >>>`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884833816


   @ashb something that we need to discuss when you return - it seems (needs confirmation) that some people connect Airlfow HA schedulers to a DB in active/active mode and it causes the locking problem (MySQL in this case as usual). 
   
   I think we might want to either be more explicit in Airflow about that, or detect it and inform the user (better) or possibly implement support for Active/Active mode (the best but might not be possible/easy). Happy to have a discussion on it when you are back from holidays ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit edited a comment on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit edited a comment on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884780351


   @namjals Yes , able to perform the select and insert on the mysql db . 
   
   ```pycon
   >>> # Get a cursor
   ... cur = cnx.cursor()
   >>>
   >>> # Execute a query
   ... cur.execute("SELECT CURDATE()")
   >>>
   >>> # Fetch one result
   ... row = cur.fetchone()
   >>> print("Current date is: {0}".format(row[0]))
   Current date is: 2021-07-22
   >>>
   >>> # Close connection
   ... cnx.close()
   >>>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884857501


   We support Scheduler HA (running more than one scheduler) https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html?highlight=scheduler%20ha#running-more-than-one-scheduler  - our Scheduler runs in Active/Active mode (which means that both schedulers are parsing DAGs at the same time). This is supported in MySQL 8+ and *should* work (of course there might be some edge cases, but generally we tested it and it works).
   
   This is of course very different than Database HA. This is something that is outside of the realm of Airflow and is done by your deployment. From the very beginning we had the assumption, and we have developed Airflow 2 with the assumption that the Database is running at most in Active/Passive mode. The comment from #14788 indicated that someone had similar problem when running DB in active/active mode behind (and there switching to talk directly to only one physical DB helped). So my assumption was that one of the reasons is that you have similar setup.
   
   Also - we've seen similar problems with various proxies which provided kind'a poor's man DB HA, where the proxy had several physical DB clusters behind. We heavily base our Scheduler's HA on Database locking, and locking is hard problem to solve in Active/Active setup. 
   
   That leads to the suggestion - that this might be similar case for you. If it is not and you are 100% sure that you have single physical DB behind then the problem needs deeper investigation and will take quite some time to resolve, and possibly some iterations here to find out the reason (because we have not seen it in our tests).
   
   So if you are 100% sure you do not have multiple DBs being accessed at the same time (even if single proxy is used) then my advice will be to switch to Postgres, as it might take quite a lot of time to find out the cause (we've seen it in the past - sometimes people used customized versions of the databases with some functionality disabled for example). Postgres is much more stable, and less configurable (MySQL for example can have multiple engines with different capabilities) and there might be many other reasons why MySQL (especially custom-configured one) creates problems. 
   
   Unfortunlately we have no capacity to investigate and help individual users here in the community and investigate those cases deeply, so unless you have time and capacity to try to investigate it and provide more information, I am afraid it might take quite some time to even reproduce this kind of problem you have. 
   
   Going Postgres is much more "certain" route, and if you are keen on timing, I'd heartily recommend going that route.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-1019564633


   I believe your Python installation with packages is broken. Seems like some psycopg2 (postgres library) problem. I suggest to try to reinstall postgress from scratch. I hav enever seen similar problem before, but it's something really low-level.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884825885


   1) I think for now you can use single scheduler - until the problem is diagnosed and fixed. I am not sure we will be able to diagnose and find the root of it quickly, and almost for sure this will require change in Airflow which will likely take weeks ore months to release. So there is  little chance that you will be unblocked quickly on that without workarounding it.
   
   2) I will keep on repeating it - MySQL has a LOT of problems comparing to Postgres. Locking, encoding, stability, you name it. If you still can, switch to Postgres.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] namjals commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
namjals commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884149426


   It looks like you are not getting the lock.
   How about accessing the installed mysql server by referring to the site below and checking if queries such as select and insert are possible?
   https://github.com/mysql/mysql-connector-python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] thsubramani commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
thsubramani commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-1005975941


   @potiuk I have tried with postgres 13 version and getting below erros.
   
    airflow-c-qa4-64956645c-tr4zm airflow-scheduler [2022-01-05 17:18:26,109] {scheduler_job.py:721} INFO - Exited execute loop
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler Traceback (most recent call last):
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     cursor, statement, parameters, context
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     cursor.execute(statement, parameters)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler psycopg2.errors.LockNotAvailable: could not obtain lock on row in relation "slot_pool"
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler The above exception was the direct cause of the following exception:
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler Traceback (most recent call last):
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/bin/airflow", line 8, in <module>
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     sys.exit(main())
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     args.func(args)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return func(*args, **kwargs)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 91, in wrapper
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return f(*args, **kwargs)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/scheduler_command.py", line 70, in scheduler
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     job.run()
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/base_job.py", line 245, in run
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     self._execute()
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 694, in _execute
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     self._run_scheduler_loop()
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 787, in _run_scheduler_loop
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     num_queued_tis = self._do_scheduling(session)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 926, in _do_scheduling
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     num_queued_tis = self._critical_section_execute_task_instances(session=session)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 550, in _critical_section_execute_task_instances
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/session.py", line 67, in wrapper
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return func(*args, **kwargs)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 307, in _executable_task_instances_to_queued
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     pools = models.Pool.slots_stats(lock_rows=True, session=session)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/session.py", line 67, in wrapper
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return func(*args, **kwargs)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/pool.py", line 107, in slots_stats
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     pool_rows: Iterable[Tuple[str, int]] = query.all()
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3319, in all
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return list(self)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3481, in __iter__
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return self._execute_and_instances(context)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 3506, in _execute_and_instances
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     result = conn.execute(querycontext.statement, self._params)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1020, in execute
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return meth(self, multiparams, params)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     return connection._execute_clauseelement(self, multiparams, params)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_clauseelement
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     distilled_params,
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1324, in _execute_context
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     e, statement, parameters, cursor, context
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1515, in _handle_dbapi_exception
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     util.raise_(newraise, with_traceback=exc_info[2], from_=e)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     raise exception
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     cursor, statement, parameters, context
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler   File "/home/airflow/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler     cursor.execute(statement, parameters)
   airflow-c-qa4-64956645c-tr4zm airflow-scheduler AttributeError: 'PGExecutionContext_psycopg2' object has no attribute '_stan_scope'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884844265


   @potiuk Thanks again for your response . Just wanted to confirm that the Airflow 2.x Scheduler HA ( Each scheduler will be fully 'active'. )  is supported with DB Mysql 8 + ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884008943


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884827229


   If you need to have HA scheduler, switching to Postgres is the fastest route, really.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884833816


   @ashb something that we need to discuss when you return - it seems (needs confirmation) that some people connect Airlfow to a DB in active/active mode and it causes the locking problem (MySQL in this case as usual). 
   
   I think we might want to either be more explicit in Airflow about that, or detect it (better) or possibly implement support for Active/Active mode (the best but might not be possible/easy). Happy to have a discussion on it when you are back from holidays ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884857501


   
   We support Scheduler HA (running more than one scheduler) https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html?highlight=scheduler%20ha#running-more-than-one-scheduler  - our Scheduler runs in Active/Active mode (which means that both schedulers are parsing DAGs at the same time). This is supported in MySQL 8+ and *should* work (of course there might be some edge cases, but generally we tested it and it works).
   
   This is of course very different than Database HA. This is something that is outside of the realm of Airflow and is done by your deployment. From the very beginning we had the assumption, and we have developed Airflow 2 with the assumption that the Database is running at most in Active/Passive mode. The comment from #14788 indicated that someone had similar problem when running DB in active/active mode behind (and there switching to talk directly to only one physical DB helped). So my assumption was that one of the reasons is that you have similar setup.
   
   Also - we've seen similar problems with various proxies which provided kind'a poor's man DB HA, where the proxy had several physical DB clusters behind. We heavily base our Scheduler's HA on Database locking, and locking is hard problem to solve in Active/Active setup. 
   
   That leads to the suggestion - that this might be similar case for you. If it is not and you are 100% sure that you have single physical DB behind then the problem needs deeper investigation and will take quite some time to resolve, and possibly some iterations here to find out the reason (because we have not seen it in our tests).
   
   So if you are 100% sure you do not have multiple DBs being accessed at the same time (even if single proxy is used) then my advice will be to switch to Postgres, as it might take quite a lot of time to find out the cause (we've seen it in the past - sometimes people used customized versions of the databases with some functionality disabled for example). Postgres is much more stable, and less configurable (MySQL for example can have multiple engines with different capabilities) and there might be many other reason why MySQL (especially custom-configured one) creates problem. 
   
   Unfortunlately we have no capacity to investigate and help individual users here in the community and investigate those cases deeply, so unless you have time and capacity to try to investigate it and provide more information, I am afraid it might take quite some time to even reproduce this kind of problem you have. 
   
   Going Postgres is much more "certain" route, and if you are keen on timing, I'd heartily recommend going that route.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884833816


   @ashb something that we need to discuss when you return - it seems (needs confirmation) that some people connect Airlfow HA schedulers to a DB in active/active mode and it causes the locking problem (MySQL in this case as usual). 
   
   I think we might want to either be more explicit in Airflow about that, or detect it (better) or possibly implement support for Active/Active mode (the best but might not be possible/easy). Happy to have a discussion on it when you are back from holidays ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884831959


   Just a note - discussion here https://github.com/apache/airflow/discussions/14788  indicates that the problem might be that you want to connect airflow to a DB in Active/Active HA mode. This is NOT supported. The database must be in Active/Passive HA mode. You either have to have your DB in Active-Passive mode, or explicitly configure Airflow to talk to only one DB server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884894015


   @potiuk Thanks for the explanation . and Yes we are only using single DB in this case . 
   As you mentioned , we will try out the same with Postgres DB .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884787462


   @potiuk Could you please help here ? As we are blocked due to this error . I'm not sure if this is can be a bug with the airflow version 2.1.0 
   You help is highly appreciated .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit commented on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit commented on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884851067


   @potiuk and based on your previous comment : we are not using db in active/active . we are just using single Mysql DB and pointing two airflow schedulers to same DB . I believe this is the correct and recommended way to achieve schedulers HA with Airflow 2.1.0 version . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sorabhgit edited a comment on issue #17127: Airflow 2.1.0 with Schedulers HA Failing

Posted by GitBox <gi...@apache.org>.
sorabhgit edited a comment on issue #17127:
URL: https://github.com/apache/airflow/issues/17127#issuecomment-884780351


   @namjals Yes , able to perform the select and insert on the mysql db . 
   
   ```
   >>> # Get a cursor
   ... cur = cnx.cursor()
   >>>
   >>> # Execute a query
   ... cur.execute("SELECT CURDATE()")
   >>>
   >>> # Fetch one result
   ... row = cur.fetchone()
   >>> print("Current date is: {0}".format(row[0]))
   Current date is: 2021-07-22
   >>>
   >>> # Close connection
   ... cnx.close()
   >>>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org