You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/16 11:29:06 UTC
[GitHub] [airflow] hussainsaify opened a new issue #22307: Airflow webserver stops responding and then restarted by liveness probes
hussainsaify opened a new issue #22307:
URL: https://github.com/apache/airflow/issues/22307
### Apache Airflow version
2.2.4 (latest released)
### What happened
Hi Team,
After upgrading to airflow 2.2.3, we have started facing an issue where webserserver gets stuck and is eventually restarted after failing the liveness probes. However, this issue is transient and we see it 2-3 times per week. We also see high spike of cpu during the time issue occured.
Cloud - AWS EKS 1.21
Helm Chart - 1.4.0
Current airflow version- 2.2.4
Please let me know in case you need more details
Thanks,
Hussain
### What you think should happen instead
Webserver should continue running and responds to requests.
### How to reproduce
The issue is transient, occurs 2-3 times per week.
### Operating System
Debian GNU/Linux 10 (buster)
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==3.0.0
apache-airflow-providers-apache-hive==2.2.0
apache-airflow-providers-celery==2.1.0
apache-airflow-providers-cncf-kubernetes==3.0.2
apache-airflow-providers-docker==2.4.1
apache-airflow-providers-elasticsearch==2.2.0
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-google==6.4.0
apache-airflow-providers-grpc==2.0.1
apache-airflow-providers-hashicorp==2.1.1
apache-airflow-providers-http==2.0.3
apache-airflow-providers-imap==2.2.0
apache-airflow-providers-microsoft-azure==3.6.0
apache-airflow-providers-microsoft-mssql==2.1.0
apache-airflow-providers-mysql==2.2.0
apache-airflow-providers-odbc==2.0.1
apache-airflow-providers-postgres==3.0.0
apache-airflow-providers-redis==2.0.1
apache-airflow-providers-sendgrid==2.0.1
apache-airflow-providers-sftp==2.4.1
apache-airflow-providers-slack==4.2.0
apache-airflow-providers-sqlite==2.1.0
apache-airflow-providers-ssh==2.4.0
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Please find below helm values.
defaultAirflowTag: "2.2.4"
airflowVersion: "2.2.4"
labels:
spotinst.io/restrict-scale-down: "true"
nodeSelector:
node-class: worker
fernetKeySecretName: airflow-fernet-key
webserverSecretKeySecretName: airflow-webserver-secret-key
ingress:
enabled: true
web:
host: "airflow.************.com"
annotations:
kubernetes.io/ingress.class: "nginx"
hosts:
- name: "airflow.**************.com"
tls:
enabled: true
secretName: "tls-wildcard"
# Configs for the Ingress of the flower Service
flower:
path: "/flower"
pathType: "ImplementationSpecific"
host: "airflow.************.com"
annotations:
kubernetes.io/ingress.class: "nginx"
hosts:
- name: "airflow.***********.com"
tls:
enabled: true
secretName: "tls-wildcard"
airflowPodAnnotations:
ad.datadoghq.com/airflow-web.check_names: '["airflow"]'
ad.datadoghq.com/airflow-web.init_configs: '[{}]'
ad.datadoghq.com/airflow-web.instances: |
[
{
"url": "http://%%host%%:8080"
}
]
executor: "KubernetesExecutor"
extraEnv: |
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow-fernet-key
key: fernet-key
- name: AZURE_TENANT_ID
valueFrom:
secretKeyRef:
name: airflow-azuread-creds
key: tenant_id
- name: AZURE_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: airflow-azuread-creds
key: client_secret
- name: AZURE_CLIENT_ID
valueFrom:
secretKeyRef:
name: airflow-azuread-creds
key: client_id
- name: AIRFLOW__METRICS__STATSD_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
env:
- name: AIRFLOW_CONN_AWS_LOG
value: "aws://"
- name: AIRFLOW__CORE__REMOTE_LOGGING
value: "True"
- name: AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER
value: "s3://***********/airflow-etl/logs"
- name: AIRFLOW__CORE__REMOTE_LOG_CONN_ID
value: "aws_development"
- name: AIRFLOW__CORE__ENCRYPT_S3_LOGS
value: "True"
- name: AIRFLOW__CORE__PARALLELISM
value: "64"
- name: AIRFLOW__CORE__DAG_CONCURRENCY
value: "32"
- name: AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG
value: "32"
- name: AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE
value: "10"
- name: AIRFLOW__CORE__SQL_ALCHEMY_MAX_OVERFLOW
value: "30"
- name: AIRFLOW__SMTP__SMTP_HOST
value: "smtp.mail-relay.svc"
- name: AIRFLOW__SMTP__SMTP_MAIL_FROM
value: "***********"
- name: AIRFLOW__METRICS__STATSD_ON
value: "True"
- name: AIRFLOW__METRICS__STATSD_PORT
value: "8125"
- name: AIRFLOW__METRICS__STATSD_PREFIX
value: "airflow"
- name: AIRFLOW__WEBSERVER__AUTHENTICATE
value: "True"
- name: AIRFLOW__WEBSERVER__EXPOSE_CONFIG
value: "True"
- name: AIRFLOW__WEBSERVER__RBAC
value: "True"
- name: AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX
value: "True"
- name: AIRFLOW__LOGGING__FAB_LOGGING_LEVEL
value: "WARN"
- name: AIRFLOW__LOGGING__LOGGING_LEVEL
value: "DEBUG"
- name: AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT
value: "240.0"
- name: AIRFLOW__CELERY__FLOWER_URL_PREFIX
value: "/flower"
- name: AIRFLOW__API__AUTH_BACKEND
value: "airflow.api.auth.backend.basic_auth"
- name: AIRFLOW__CORE__HOSTNAME_CALLABLE
value: "socket:gethostname"
- name: AIRFLOW__WEBSERVER__WORKER_REFRESH_BATCH_SIZE
value: "0"
- name: AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL
value: "0"
data:
metadataSecretName: airflow-metadata-connection
# Airflow scheduler settings
scheduler:
replicas: 2
serviceAccount:
create: false
name: airflow
# Scheduler pod disruption budget
podDisruptionBudget:
enabled: true
config:
maxUnavailable: 1
resources:
limits:
cpu: "4000m"
memory: "8Gi"
requests:
cpu: "1000m"
memory: "2Gi"
nodeSelector:
node-class: worker
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
component: scheduler
topologyKey: kubernetes.io/hostname
weight: 100
# Airflow webserver settings
webserver:
replicas: 1
serviceAccount:
create: false
name: airflow
resources:
limits:
cpu: "4000m"
memory: "8Gi"
requests:
cpu: "1500m"
memory: "2Gi"
webserverConfig: |
import os
from airflow.configuration import conf
from flask_appbuilder.security.manager import AUTH_OAUTH
from flask import Flask
from flask_appbuilder import SQLA, AppBuilder
SQLALCHEMY_DATABASE_URI = conf.get("core", "SQL_ALCHEMY_CONN")
basedir = os.path.abspath(os.path.dirname(__file__))
AUTH_USER_REGISTRATION_ROLE = "Viewer"
AUTH_TYPE = AUTH_OAUTH
AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_USER_REGISTRATION = True
AZURE_TENANT_ID = os.environ.get("AZURE_TENANT_ID")
API_BASE = f"https://login.microsoftonline.com/{AZURE_TENANT_ID}/oauth2"
ACCESS_TOKEN_URL = f"{API_BASE}/token"
AUTHORIZE_URL = f"{API_BASE}/authorize"
AUTH_ROLES_MAPPING = {
***********
}
OAUTH_PROVIDERS = [
{
"name": "azure",
"icon": "fa-windows",
"token_key": "access_token",
"remote_app": {
"client_id": os.environ.get("AZURE_CLIENT_ID"),
"client_secret": os.environ.get("AZURE_CLIENT_SECRET"),
"api_base_url": API_BASE,
"client_kwargs": {
"resource": os.environ.get("AZURE_CLIENT_ID"),
"scope": "User.read name preferred_username email profile upn https://graph.windows.net/.default openid"
},
"request_token_url": None,
"access_token_url": ACCESS_TOKEN_URL,
"authorize_url": AUTHORIZE_URL,
},
}
]
# Airflow Triggerer Config
triggerer:
enabled: true
# Create ServiceAccount
serviceAccount:
create: false
name: airflow
workers:
serviceAccount:
create: false
name: airflow
resources:
limits:
cpu: "1000m"
memory: "2Gi"
requests:
cpu: "400m"
memory: "1000Mi"
# Flower settings
flower:
enabled: false
# Statsd settings
statsd:
enabled: false
# Configuration for the redis provisioned by the chart
redis:
enabled: false
postgresql:
enabled: false
# Git sync
dags:
gitSync:
enabled: true
repo: git@github.com:***********
branch: development/airflow
subPath: "dags"
sshKeySecret: "airflow-git"
knownHosts: |
github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
wait: 120
resources:
limits:
cpu: "100m"
memory: "400Mi"
requests:
cpu: "50m"
memory: "200Mi"
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #22307: Airflow webserver stops responding and then restarted by liveness probes
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22307:
URL: https://github.com/apache/airflow/issues/22307#issuecomment-1069024463
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #22307: Airflow webserver stops responding and then restarted by liveness probes
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22307:
URL: https://github.com/apache/airflow/issues/22307#issuecomment-1069049135
You need to provide logs (and possibly analyse) of the webserver and likely kubernetes from around the time, the webserver is killed. Ideally you should try to analyze it before and see if you can identify the reason yourself. Also you should see how the log files differ from "normal" situation.
There is no way we can act on it without seeing this information. Converting it into discussion until more information is available.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org