You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/16 21:38:38 UTC
[GitHub] [airflow] alexakra opened a new issue, #27080: Git Sync containers are missing when Dag Processor is enabled
alexakra opened a new issue, #27080:
URL: https://github.com/apache/airflow/issues/27080
### Official Helm Chart version
1.7.0 (latest released)
### Apache Airflow version
2.3.4
### Kubernetes Version
1.22.6
### Helm Chart configuration
```
...
# Airflow Dag Processor Config
dagProcessor:
enabled: true
# Number of airflow dag processors in the deployment
replicas: 1
...
# Git sync
dags:
persistence:
# Enable persistent volume for storing dags
enabled: true
# Volume size for dags
size: 1Gi
# If using a custom storageClass, pass name here
storageClassName:
# access mode of the persistent volume
accessMode: ReadWriteOnce
## the name of an existing PVC to use
existingClaim: airflow-dags
## optional subpath for dag volume mount
subPath: ~
gitSync:
enabled: true
...
```
### Docker Image customisations
_No response_
### What happened
When Dag Processor is enabled, the Git Sync containers (git-sync-init and git-sync) are missing in airflow-scheduler pod.
They are neither part of the airflow-dag-processor pod.
### What you think should happen instead
Git Sync containers (git-sync-init and git-sync) should be part of one of airflow-scheduler or airflow-dag-processor pods.
Below attaching parts of the helm chart log showing deployment of both airflow-scheduler and airflow-dag-processor pods.
```
################################
## Airflow Dag Processor Deployment
#################################
kind: Deployment
apiVersion: apps/v1
metadata:
name: airflow-dag-processor
labels:
tier: airflow
component: dag-processor
release: airflow
chart: "airflow-1.7.0"
heritage: Helm
spec:
replicas: 1
selector:
matchLabels:
tier: airflow
component: dag-processor
release: airflow
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 50%
template:
metadata:
labels:
tier: airflow
component: dag-processor
release: airflow
annotations:
checksum/metadata-secret: 765446a9def21895960ebc0df295399c35d3467cdd84be6ef8bc65c18ca0f7e5
checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
checksum/airflow-config: a78767ecb4f7423a34e69c0add978a0ff15452ff0a54a6cb08fc93f06aaf5e7e
checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
nodeSelector:
nodepool-type: airflow
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
component: dag-processor
topologyKey: kubernetes.io/hostname
weight: 100
tolerations:
[]
topologySpreadConstraints:
[]
terminationGracePeriodSeconds: 60
restartPolicy: Always
serviceAccountName: airflow-dag-processor
securityContext:
runAsUser: 50000
fsGroup: 0
imagePullSecrets:
- name: ds-registry
initContainers:
- name: wait-for-airflow-migrations
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 100m
memory: 128Mi
image: seebodsregistry.azurecr.io/airflow:2.3.4-extended
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config
mountPath: "/opt/airflow/airflow.cfg"
subPath: airflow.cfg
readOnly: true
args:
- airflow
- db
- check-migrations
- --migration-wait-timeout=60
envFrom:
[]
env:
# Dynamically created environment variables
# Dynamically created secret envs
# Extra env
- name: AIRFLOW__CORE__DAGS_FOLDER
value: '/opt/airflow/all-dags'
- name: AIRFLOW__CORE__PLUGINS_FOLDER
value: '/opt/airflow/plugins/repo/deployment/plugins'
- name: AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT
value: '600'
- name: AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT
value: '3600'
- name: AIRFLOW__CORE__PARALLELISM
value: '128'
- name: AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE
value: 'True'
- name: AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__PARSING_PROCESSES
value: '15'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_TO_CREATE_PER_LOOP
value: '100'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE
value: '100'
- name: AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD
value: '1200'
- name: AIRFLOW__WEBSERVER__INSTANCE_NAME
value: 'Staging'
- name: AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE
value: 'True'
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow-fernet-key
key: fernet-key
# For Airflow <2.3, backward compatibility; moved to [database] in 2.3
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__WEBSERVER__SECRET_KEY
valueFrom:
secretKeyRef:
name: airflow-webserver-key
key: webserver-secret-key
containers:
- name: dag-processor
image: seebodsregistry.azurecr.io/airflow:2.3.4-extended
imagePullPolicy: IfNotPresent
args:
- bash
- -c
- exec airflow dag-processor
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 100m
memory: 128Mi
volumeMounts:
- name: logs
mountPath: "/opt/airflow/logs"
- name: config
mountPath: "/opt/airflow/airflow.cfg"
subPath: airflow.cfg
readOnly: true
- name: config
mountPath: "/opt/airflow/config/airflow_local_settings.py"
subPath: airflow_local_settings.py
readOnly: true
- name: dags
mountPath: /opt/airflow/dags
readOnly: True
envFrom:
[]
env:
# Dynamically created environment variables
# Dynamically created secret envs
# Extra env
- name: AIRFLOW__CORE__DAGS_FOLDER
value: '/opt/airflow/all-dags'
- name: AIRFLOW__CORE__PLUGINS_FOLDER
value: '/opt/airflow/plugins/repo/deployment/plugins'
- name: AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT
value: '600'
- name: AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT
value: '3600'
- name: AIRFLOW__CORE__PARALLELISM
value: '128'
- name: AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE
value: 'True'
- name: AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__PARSING_PROCESSES
value: '15'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_TO_CREATE_PER_LOOP
value: '100'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE
value: '100'
- name: AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD
value: '1200'
- name: AIRFLOW__WEBSERVER__INSTANCE_NAME
value: 'Staging'
- name: AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE
value: 'True'
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow-fernet-key
key: fernet-key
# For Airflow <2.3, backward compatibility; moved to [database] in 2.3
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__WEBSERVER__SECRET_KEY
valueFrom:
secretKeyRef:
name: airflow-webserver-key
key: webserver-secret-key
livenessProbe:
initialDelaySeconds: 10
timeoutSeconds: 20
failureThreshold: 5
periodSeconds: 120
exec:
command:
- sh
- -c
- |
CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --hostname $(hostname)
volumes:
- name: config
configMap:
name: airflow-airflow-config
- name: dags
persistentVolumeClaim:
claimName: airflow-dags
- name: logs
persistentVolumeClaim:
claimName: airflow-logs
################################
## Airflow Scheduler Deployment/StatefulSet
#################################
# Are we using a local executor?
# Is persistence enabled on the _workers_?
# This is important because in $local mode, the scheduler assumes the role of the worker
# If we're using a StatefulSet
# We can skip DAGs mounts on scheduler if dagProcessor is enabled, except with $local mode
# If we're using elasticsearch logging
kind: Deployment
apiVersion: apps/v1
metadata:
name: airflow-scheduler
labels:
tier: airflow
component: scheduler
release: airflow
chart: "airflow-1.7.0"
heritage: Helm
executor: KubernetesExecutor
spec:
replicas: 1
selector:
matchLabels:
tier: airflow
component: scheduler
release: airflow
template:
metadata:
labels:
tier: airflow
component: scheduler
release: airflow
annotations:
checksum/metadata-secret: 765446a9def21895960ebc0df295399c35d3467cdd84be6ef8bc65c18ca0f7e5
checksum/result-backend-secret: 74e3e99feee51248d44224665d60fab543dd6b25ba95f04e6fcb0e5758342056
checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
checksum/airflow-config: a78767ecb4f7423a34e69c0add978a0ff15452ff0a54a6cb08fc93f06aaf5e7e
checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
nodeSelector:
nodepool-type: airflow
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
component: scheduler
topologyKey: kubernetes.io/hostname
weight: 100
tolerations:
[]
topologySpreadConstraints:
[]
restartPolicy: Always
terminationGracePeriodSeconds: 10
serviceAccountName: airflow-scheduler
securityContext:
runAsUser: 50000
fsGroup: 0
imagePullSecrets:
- name: ds-registry
initContainers:
- name: wait-for-airflow-migrations
resources:
limits:
cpu: 8000m
memory: 8Gi
requests:
cpu: 500m
memory: 1Gi
image: seebodsregistry.azurecr.io/airflow:2.3.4-extended
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config
mountPath: "/opt/airflow/airflow.cfg"
subPath: airflow.cfg
readOnly: true
args:
- airflow
- db
- check-migrations
- --migration-wait-timeout=60
envFrom:
[]
env:
# Dynamically created environment variables
# Dynamically created secret envs
# Extra env
- name: AIRFLOW__CORE__DAGS_FOLDER
value: '/opt/airflow/all-dags'
- name: AIRFLOW__CORE__PLUGINS_FOLDER
value: '/opt/airflow/plugins/repo/deployment/plugins'
- name: AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT
value: '600'
- name: AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT
value: '3600'
- name: AIRFLOW__CORE__PARALLELISM
value: '128'
- name: AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE
value: 'True'
- name: AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__PARSING_PROCESSES
value: '15'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_TO_CREATE_PER_LOOP
value: '100'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE
value: '100'
- name: AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD
value: '1200'
- name: AIRFLOW__WEBSERVER__INSTANCE_NAME
value: 'Staging'
- name: AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE
value: 'True'
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow-fernet-key
key: fernet-key
# For Airflow <2.3, backward compatibility; moved to [database] in 2.3
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__WEBSERVER__SECRET_KEY
valueFrom:
secretKeyRef:
name: airflow-webserver-key
key: webserver-secret-key
containers:
# Always run the main scheduler container.
- name: scheduler
image: seebodsregistry.azurecr.io/airflow:2.3.4-extended
imagePullPolicy: IfNotPresent
args:
- bash
- -c
- exec airflow scheduler
envFrom:
[]
env:
# Dynamically created environment variables
# Dynamically created secret envs
# Extra env
- name: AIRFLOW__CORE__DAGS_FOLDER
value: '/opt/airflow/all-dags'
- name: AIRFLOW__CORE__PLUGINS_FOLDER
value: '/opt/airflow/plugins/repo/deployment/plugins'
- name: AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT
value: '600'
- name: AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT
value: '3600'
- name: AIRFLOW__CORE__PARALLELISM
value: '128'
- name: AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE
value: 'True'
- name: AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL
value: '120'
- name: AIRFLOW__SCHEDULER__PARSING_PROCESSES
value: '15'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_TO_CREATE_PER_LOOP
value: '100'
- name: AIRFLOW__SCHEDULER__MAX_DAGRUNS_PER_LOOP_TO_SCHEDULE
value: '100'
- name: AIRFLOW__SCHEDULER__SCHEDULER_ZOMBIE_TASK_THRESHOLD
value: '1200'
- name: AIRFLOW__WEBSERVER__INSTANCE_NAME
value: 'Staging'
- name: AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE
value: 'True'
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: airflow-fernet-key
key: fernet-key
# For Airflow <2.3, backward compatibility; moved to [database] in 2.3
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-postgresql
key: connection
- name: AIRFLOW__WEBSERVER__SECRET_KEY
valueFrom:
secretKeyRef:
name: airflow-webserver-key
key: webserver-secret-key
livenessProbe:
initialDelaySeconds: 10
timeoutSeconds: 20
failureThreshold: 5
periodSeconds: 60
exec:
command:
- sh
- -c
- |
CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type SchedulerJob --hostname $(hostname)
resources:
limits:
cpu: 8000m
memory: 8Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: config
mountPath: /opt/airflow/pod_templates/pod_template_file.yaml
subPath: pod_template_file.yaml
readOnly: true
- name: logs
mountPath: "/opt/airflow/logs"
- name: config
mountPath: "/opt/airflow/airflow.cfg"
subPath: airflow.cfg
readOnly: true
- name: config
mountPath: "/opt/airflow/config/airflow_local_settings.py"
subPath: airflow_local_settings.py
readOnly: true
- name: scheduler-log-groomer
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
image: seebodsregistry.azurecr.io/airflow:2.3.4-extended
imagePullPolicy: IfNotPresent
args:
- bash
- /clean-logs
env:
- name: AIRFLOW__LOG_RETENTION_DAYS
value: "15"
volumeMounts:
- name: logs
mountPath: "/opt/airflow/logs"
volumes:
- name: config
configMap:
name: airflow-airflow-config
- name: logs
persistentVolumeClaim:
claimName: airflow-logs
```
### How to reproduce
Both Git Sync and Dag Processor should be enabled.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Git Sync containers are missing when Dag Processor is enabled [airflow]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #27080:
URL: https://github.com/apache/airflow/issues/27080#issuecomment-2006007842
This issue has been closed because it has not received response from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Git Sync containers are missing when Dag Processor is enabled [airflow]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #27080: Git Sync containers are missing when Dag Processor is enabled
URL: https://github.com/apache/airflow/issues/27080
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Git Sync containers are missing when Dag Processor is enabled [airflow]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #27080:
URL: https://github.com/apache/airflow/issues/27080#issuecomment-1935426209
This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27080: Git Sync containers are missing when Dag Processor is enabled
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #27080:
URL: https://github.com/apache/airflow/issues/27080#issuecomment-1376849039
Related #27545 #27476 #27124
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org