You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/19 15:01:05 UTC
[GitHub] [airflow] Bruschkov opened a new issue #20956: Scheduler crashes due to "MySQLdb IntegrityError: Duplicate entry" (Airflow 2.1.1)
Bruschkov opened a new issue #20956:
URL: https://github.com/apache/airflow/issues/20956
### Apache Airflow version
2.1.1
### What happened
Scheduler regularly crashes with error messages like this:
```
MySQLdb._exceptions.IntegrityError: (1062, "Duplicate entry 'some-ETL-2022-01-19 14:00:00.000000' for key 'dag_id'")
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/__main__.py", line 40, in main
args.func(args)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 91, in wrapper
return f(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/scheduler_command.py", line 64, in scheduler
job.run()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/base_job.py", line 237, in run
self._execute()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1303, in _execute
self._run_scheduler_loop()
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1396, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1492, in _do_scheduling
self._create_dagruns_for_dags(guard, session)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/retries.py", line 76, in wrapped_function
for attempt in run_with_db_retries(max_retries=retries, logger=logger, **retry_kwargs):
File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 390, in __iter__
do = self.iter(retry_state=retry_state)
File "/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 356, in iter
return fut.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/retries.py", line 85, in wrapped_function
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1583, in _create_dagruns_for_dags
self._create_dag_runs(query.all(), session)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1625, in _create_dag_runs
run = dag.create_dagrun(
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py", line 1796, in create_dagrun
session.flush()
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2523, in flush
self._flush(objects)
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2664, in _flush
transaction.rollback(_capture_exception=True)
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
raise exception
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2624, in _flush
flush_context.execute()
File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
...
```
### What you expected to happen
We would expect these errors not to occur. According to https://github.com/apache/airflow/issues/9148 and https://github.com/apache/airflow/issues/13925 this issue should have been fixed a couple of versions ago.
### How to reproduce
_No response_
### Operating System
kubernetes
### Versions of Apache Airflow Providers
```
apache-airflow-providers-amazon==2.0.0
apache-airflow-providers-celery==2.0.0
apache-airflow-providers-cncf-kubernetes==2.0.0
apache-airflow-providers-docker==2.0.0
apache-airflow-providers-elasticsearch==2.0.1
apache-airflow-providers-ftp==2.0.0
apache-airflow-providers-google==4.0.0
apache-airflow-providers-grpc==2.0.0
apache-airflow-providers-hashicorp==2.0.0
apache-airflow-providers-http==2.0.0
apache-airflow-providers-imap==2.0.0
apache-airflow-providers-microsoft-azure==3.0.0
apache-airflow-providers-mysql==2.0.0
apache-airflow-providers-odbc==2.0.0
apache-airflow-providers-postgres==2.0.0
apache-airflow-providers-redis==2.0.0
apache-airflow-providers-sendgrid==2.0.0
apache-airflow-providers-sftp==2.0.0
apache-airflow-providers-slack==4.0.0
apache-airflow-providers-sqlite==2.0.0
apache-airflow-providers-ssh==2.0.0
```
### Deployment
Other 3rd-party Helm chart
### Deployment details
deployed via https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.0 to kubernetes cluster (kubernetes 1.18).
Backend is a mariaDB (10.3.31)
Docker image used as base image: apache/airflow:2.1.1-python3.8
Additional python dependencies installed:
```
airflow-exporter==1.5.2
boto3==1.18.58
s3fs==0.4.*
pandas==1.3.3
sqlalchemy==1.3.18
sqlalchemy-redshift==0.8.2
smart_open[aws]==2.1.*
# Use PyMySQL as dialect to fix SSL connection error
PyMySQL==1.0.2
```
Relevant parts of the airflow configuration:
```
airflow:
config:
# [core]
AIRFLOW__CORE__PARALLELISM: "24"
AIRFLOW__CORE__DAG_CONCURRENCY: "20"
AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: "1"
AIRFLOW__CORE__LOAD_EXAMPLES: "False"
AIRFLOW__CORE__STORE_SERIALIZED_DAGS: "False"
```
### Anything else
Between 1 and 10 scheduler restarts per hour on average with the above error message.
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #20956: Scheduler crashes due to "MySQLdb IntegrityError: Duplicate entry" (Airflow 2.1.1)
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20956:
URL: https://github.com/apache/airflow/issues/20956#issuecomment-1024976715
It's impossible to get any reprosucibility here. You likely have some duplication in your dag_ids - same dag_ids coming from multiple files. But it's hard to say what the problem is . Converting it into discussion. If you provide more information on your dag structure, logical dags you have and dynamic dags generation schemes you have maybe we can somehow help
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #20956: Scheduler crashes due to "MySQLdb IntegrityError: Duplicate entry" (Airflow 2.1.1)
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #20956:
URL: https://github.com/apache/airflow/issues/20956#issuecomment-1016551528
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org