You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "C-K-Loan (via GitHub)" <gi...@apache.org> on 2023/02/16 12:25:47 UTC
[GitHub] [airflow] C-K-Loan opened a new issue, #29572: Database is Locked -- Even when deleting database and using new airflow home dir
C-K-Loan opened a new issue, #29572:
URL: https://github.com/apache/airflow/issues/29572
### Apache Airflow version
2.5.1
### What happened
My Airflow started giving `Database is Locked` Errors. I tried uninstalling everything and deleting `~/airflow` but kept getting db locked errors.
My final try was to setup everything in a new home directory, i.e I set
`export AIRFLOW_HOME=~/airflow2`
then I create a new DB
```
python3 -m airflow db init
```
gives the following logs
```
DB: sqlite:////home/ckl/airflow2/airflow.db
[2023-02-16 13:05:40,430] {migration.py:204} INFO - Context impl SQLiteImpl.
[2023-02-16 13:05:40,430] {migration.py:207} INFO - Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running stamp_revision -> 290244fb8b83
WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted.
Initialization done
```
then I start the scheduler in one shell
`python3 -m airflow scheduler`
with logs
```
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2023-02-16 13:14:05 +0100] [4170653] [INFO] Starting gunicorn 20.1.0
[2023-02-16 13:14:05 +0100] [4170653] [INFO] Listening at: http://[::]:8793 (4170653)
[2023-02-16 13:14:05 +0100] [4170653] [INFO] Using worker: sync
[2023-02-16 13:14:05 +0100] [4170654] [INFO] Booting worker with pid: 4170654
[2023-02-16 13:14:05 +0100] [4170655] [INFO] Booting worker with pid: 4170655
[2023-02-16 13:14:07,512] {scheduler_job.py:714} INFO - Starting the scheduler
[2023-02-16 13:14:07,512] {scheduler_job.py:719} INFO - Processing each file at most -1 times
[2023-02-16 13:14:07,514] {executor_loader.py:107} INFO - Loaded executor: SequentialExecutor
[2023-02-16 13:14:07,520] {manager.py:163} INFO - Launched DagFileProcessorManager with pid: 4170697
[2023-02-16 13:14:07,521] {scheduler_job.py:1408} INFO - Resetting orphaned tasks for active dag runs
[2023-02-16 13:14:07,577] {settings.py:58} INFO - Configured default timezone Timezone('UTC')
[2023-02-16T13:14:07.586+0100] {manager.py:409} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
```
and in a separate shell I start the webserver
` python3 -m airflow webserver -p 6969` which creates all the default users etc.. (which gives no exceptions, all good)
Now the problem occurs, while airflow is initiazing all these users , the scheduler shell starts throwing exceptions
```
2023-02-16T13:14:07.586+0100] {manager.py:409} WARNING - Because we cannot use more than 1 thread (parsing_processes = 2) when using sqlite. So we set parallelism to 1.
Process DagFileProcessor4-Process:
Traceback (most recent call last): File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/dag_processing/processor.py", line 174, in _run_file_processor _handle_dag_file_processing()
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/dag_processing/processor.py", line 155, in _handle_dag_file_processing
result: tuple[int, int] = dag_file_processor.process_file( File "/home/ckl/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/dag_processing/processor.py", line 768, in process_file
dagbag.sync_to_db(processor_subdir=self._dag_directory, session=session)
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 72, in wrapper
return func(*args, **kwargs)
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/models/dagbag.py", line 645, in sync_to_db
for attempt in run_with_db_retries(logger=self.log):
File "/home/ckl/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 384, in __iter__
do = self.iter(retry_state=retry_state)
File "/home/ckl/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 362, in iter
raise retry_exc.reraise()
File "/home/ckl/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 195, in reraise
raise self.last_attempt.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/models/dagbag.py", line 657, in sync_to_db
serialize_errors.extend(_serialize_dag_capturing_errors(dag, session))
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/models/dagbag.py", line 628, in _serialize_dag_capturing_errors
dag_was_updated = SerializedDagModel.write_dag(
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/utils/session.py", line 72, in wrapper
return func(*args, **kwargs)
File "/home/ckl/.local/lib/python3.10/site-packages/airflow/models/serialized_dag.py", line 152, in write_dag
.scalar()
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2892, in scalar
ret = self.one()
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2869, in one
return self._iter().one()
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 2915, in _iter
result = self.session.execute(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1714, in execute
result = conn._execute_20(statement, params or {}, execution_options)
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
ret = self._execute_context(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
self._handle_dbapi_exception(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
util.raise_(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 210, in raise_
raise exception
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/home/ckl/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: SELECT ? AS anon_1
FROM serialized_dag
WHERE serialized_dag.dag_id = ? AND serialized_dag.last_updated > ?]
[parameters: (1, 'example_xcom_args', '2023-02-16 12:14:36.805117')]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
```
The UI is accessible, but it says
```
The scheduler does not appear to be running. Last heartbeat was received 5 minutes ago.
The DAGs list may not update, and new tasks will not be scheduled.
```
This essentially leaves my airflow in an unusable state.
### What you think should happen instead
scheduler should not throw db locked exceptions
### How to reproduce
See main post
### Operating System
Pop!_OS 22.04 LTS
### Versions of Apache Airflow Providers
```
apache-airflow 2.5.1
apache-airflow-providers-amazon 6.1.0
apache-airflow-providers-common-sql 1.3.0
apache-airflow-providers-ftp 3.2.0
apache-airflow-providers-http 4.1.0
apache-airflow-providers-imap 3.1.0
apache-airflow-providers-sqlite 3.3.0
```
### Deployment
Virtualenv installation
### Deployment details
I just setup via shell on my private server
### Anything else
Working fine for a long time.
Today a job started throwing OOM errors and since then my airflow is all messed up .
Maybe some rouge process is still around?
But to make sure thats not the case, I setup everything in a new directory.
not sure what else todo, I have no physical access to the machine right now and restarting iit is not an option
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis closed issue #29572: Database is Locked -- Even when deleting database and using new airflow home dir
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis closed issue #29572: Database is Locked -- Even when deleting database and using new airflow home dir
URL: https://github.com/apache/airflow/issues/29572
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] Taragolis commented on issue #29572: Database is Locked -- Even when deleting database and using new airflow home dir
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29572:
URL: https://github.com/apache/airflow/issues/29572#issuecomment-1433284718
Airflow supports SQLite only for developing purpose.
I would recommend to use [`airflow standalone`](https://airflow.apache.org/docs/apache-airflow/stable/start.html) command in case if you need setup Airflow for quick developing DAGs.
In case if you feel confident with docker or K8S you could run it on it, some doc:
- [Running Airflow in Docker](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html)
- [Helm Chart for Apache Airflow ](https://airflow.apache.org/docs/helm-chart/stable/index.html)
And also would recommend use any other other [supported backend](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#choosing-database-backend), please note if you choose MS SQL Server it is [experimental feature](https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#experimental-features), so better choose PostgreSQL or MySQL 8
And last but not least, with any other database backend rather than SQLite you could use other executors for example if you run on single machine and you do not have intensive tasks you could select [LocalExecutor](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/local.html), the reason to select other executor pretty simple [SequentialExecutor](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/sequential.html) not recommended for production usage
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #29572: Database is Locked -- Even when deleting database and using new airflow home dir
Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29572:
URL: https://github.com/apache/airflow/issues/29572#issuecomment-1433010636
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org