You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/24 13:00:42 UTC
[GitHub] [airflow] avinovarov opened a new issue #22504: Random timeouts on creating connections in kubernetes executors
avinovarov opened a new issue #22504:
URL: https://github.com/apache/airflow/issues/22504
### Apache Airflow version
2.2.3
### What happened
**The problem**
- Under some load, with hundreds of DAGs running in parallel, Airflow executors RANDOMLY throw errors on creating connections:
```
(some connections successfully created)
...
creating: raw/pg_services/folder/connection_name
[2022-03-23 02:39:43,102] {connection.py:404} ERROR - Unable to retrieve connection from secrets backend (MetastoreBackend). Checking subsequent secrets backend.
```
It is reproduced on creating random connections, on about 25-50% Airflow workers, quite a lot of workers succeed in creating connections.
**These timeouts happen only when we have dozens of DAGs running in parallel.**
### What you think should happen instead
We'd assume that connections should be created on stable basis =)
### How to reproduce
- Deploy Airflow to k8s and add connections to multiple Postgres databases (we have 75)
- Run dozens of DAGs in parallel.
### Operating System
k8s via rancher, on CentOS 7
### Versions of Apache Airflow Providers
apache-airflow-providers-postgres==2.4.0
### Deployment
Other 3rd-party Helm chart
### Deployment details
**Our setup**
- Airflow on kubernetes, with KubernetesExecutor, installed with [user community Helm chart](https://github.com/airflow-helm/charts/blob/main/charts/airflow/values.yaml)
- 75 connections to various sources, mainly Postgres databases, specified in helm chart values, like this:
```
# this is how we add connections with credentials in helm chart values
connections:
- id: pg_connection
type: postgres
host: database.domain.com
login: $PG_LOGIN
password: $PG_PASSWORD
port: 5432
schema: database
# and specify credentials with secrets below
connectionsTemplates:
PG_LOGIN:
kind: secret
name: airflow-secrets
key: PG_LOGIN
```
Of course we have k8s secrets deployed in our `airflow` namespace, and as long as we run individual DAGs we observe no errors.
### Anything else
**As long as we run individual DAGs we observe no errors.**
Based on the timeout error we assume that the issue is with gaining credentials (which apparently falls back to secondary credentials provider), not with connection to Postgres databases themselves, but this is just our guess. We also don't observe any overload on our Postgres databases.
Googling the error didn't help much, so we'd be grateful for any advice.
Thanks!
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #22504: Random timeouts on creating connections in kubernetes executors
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22504:
URL: https://github.com/apache/airflow/issues/22504#issuecomment-1077602243
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org