You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/07 13:32:22 UTC

[GitHub] [airflow] mattinbits opened a new issue #10221: Deadlock Exception with MSSQL as backend DB

mattinbits opened a new issue #10221:
URL: https://github.com/apache/airflow/issues/10221


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   This questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 1.10.9
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: On premise infrastructure. Scheduler is a CENTOS 7 docker image running on a RHEL 7 server. Database is SQL Server 2016
   - **OS** (e.g. from /etc/os-release): CentOS Linux 7 (Core)
   - **Kernel** (e.g. `uname -a`): 3.10.0-1127.13.1.el7.x86_64
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   We have a DAG where several file sensors wait for similar files in parallel, using "reschedule" mode. Periodically, one or more of these tasks fail. The logs show a deadlock reported from the Database:
   
   ```
   sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('40001', '[40001] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 111) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLExecDirectW)')
   [SQL: SELECT count(*) AS count_1
   FROM task_instance
   WHERE task_instance.pool = ? AND task_instance.state IN (?, ?)]
   [parameters: ('default_pool', 'running', 'queued')]
   (Background on this error at: http://sqlalche.me/e/dbapi)
   ```
   Checking this query directly using SSMS, I can see it executes immediately and uses the ti_pool index.
   
   And the associated stack trace:
   
   ```
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 37, in <module>
       args.func(args)
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 75, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 545, in run
       _run(args, dag, ti)
     File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 460, in _run
       run_job.run()
     File "/usr/local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 221, in run
       self._execute()
     File "/usr/local/lib/python3.7/site-packages/airflow/jobs/local_task_job.py", line 90, in _execute
       pool=self.pool):
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 851, in _check_and_change_state_before_execution
       verbose=True):
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 70, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 644, in are_dependencies_met
       session=session):
     File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 668, in get_failed_dep_statuses
       dep_context):
     File "/usr/local/lib/python3.7/site-packages/airflow/ti_deps/deps/base_ti_dep.py", line 106, in get_dep_statuses
       for dep_status in self._get_dep_statuses(ti, session, dep_context):
     File "/usr/local/lib/python3.7/site-packages/airflow/ti_deps/deps/pool_slots_available_dep.py", line 62, in _get_dep_statuses
       open_slots = pools[0].open_slots()
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/pool.py", line 113, in open_slots
       return self.slots - self.occupied_slots(session)
     File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 70, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/airflow/models/pool.py", line 70, in occupied_slots
       .filter(TaskInstance.state.in_(STATES_TO_COUNT_AS_RUNNING))
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3469, in scalar
       ret = self.one()
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3436, in one
       ret = self.one_or_none()
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3405, in one_or_none
       ret = list(self)
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3481, in __iter__
       return self._execute_and_instances(context)
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/orm/query.py", line 3506, in _execute_and_instances
       result = conn.execute(querycontext.statement, self._params)
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1020, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_clauseelement
       distilled_params,
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1324, in _execute_context
       e, statement, parameters, cursor, context
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1518, in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
       raise exception
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1284, in _execute_context
       cursor, statement, parameters, context
     File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
       cursor.execute(statement, parameters)
   ```
   
   <!-- (please include exact error messages if you can) -->
   
   **What you expected to happen**:
   The tasks succeed/reschedule as expected.
   
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   
   I have struggled to reproduce this locally away from our production environment. I suspect it is related to the size of the task_instance table and therefore hard to reproduce locally on a clean instance of airflow. 
   <!---
   
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   
   If you are using kubernetes, please attempt to recreate the issue using minikube or kind.
   
   ## Install minikube/kind
   
   - Minikube https://minikube.sigs.k8s.io/docs/start/
   - Kind https://kind.sigs.k8s.io/docs/user/quick-start/
   
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   
   You can include images using the .md sytle of
   ![alt text](http://url/to/img.png)
   
   To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file.
   
   --->
   
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #10221: Deadlock Exception with MSSQL as backend DB

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #10221:
URL: https://github.com/apache/airflow/issues/10221#issuecomment-701095136


   MSSQL is not official supported by Airflow. See: https://github.com/apache/airflow/issues/10713


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #10221: Deadlock Exception with MSSQL as backend DB

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #10221:
URL: https://github.com/apache/airflow/issues/10221#issuecomment-670518366


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #10221: Deadlock Exception with MSSQL as backend DB

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #10221:
URL: https://github.com/apache/airflow/issues/10221


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org