You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ut0mt8 (via GitHub)" <gi...@apache.org> on 2023/02/06 14:20:08 UTC
[GitHub] [airflow] ut0mt8 opened a new issue, #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
ut0mt8 opened a new issue, #29389:
URL: https://github.com/apache/airflow/issues/29389
### Apache Airflow version
2.5.1
### What happened
I think this issue have been already discussed in many issue but I'm still in the case where I basically cannot use the kubernetes Executor...
with this config:
`````
executor = KubernetesExecutor
[scheduler]
job_heartbeat_sec = 60
schedule_after_task_execution = False
scheduler_health_check_threshold = 30
scheduler_heartbeat_sec = 10
`````
and this very simple Dag
`````
import datetime
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
def sigterm_debug(ds, **kwargs):
run_date = str(ds)
print("run date is : " + str(run_date))
import time
time.sleep(500)
print("yeaah not killed ...")
dag = DAG(
dag_id="debug",
description="Debug",
schedule_interval="30 21,9 * * *",
start_date=datetime.datetime(2023, 1, 1),
catchup=False,
dagrun_timeout=datetime.timedelta(hours=3)
)
sigterm_debug_task = PythonOperator(task_id='sigterm_debug_task',
python_callable=sigterm_debug,
execution_timeout=datetime.timedelta(hours=1),
retries=2,
dag=dag)
sigterm_debug_task
`````
the Dag is killed by the scheduler raising the infamous exception: "PID of job runner does not match" at the exact time the job_heartbeat_interval run
Note: the pod by itself is fine. This is really airflow that kill the task and then the pod exit with an exit code > 0...
Note2: increasing "job_heartbeat_sec" to something very high let the DAG finish successfully but then the pods is never ripped before the job_heartbeat_sec arise ...
### What you think should happen instead
_No response_
### How to reproduce
Everything should be working as expected :)
### Operating System
Docker/Kubernetes
### Versions of Apache Airflow Providers
_No response_
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### Anything else
100% reproducible with this config.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ut0mt8 commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
Posted by "ut0mt8 (via GitHub)" <gi...@apache.org>.
ut0mt8 commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1443784093
btw I ended monkey patching airflow
https://github.com/apache/airflow/blob/2cd12fc86950646d10dfb5fb7f9c76c529715b46/airflow/jobs/local_task_job.py#L270
removing this line...
works for me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] eladkal closed issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal closed issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
URL: https://github.com/apache/airflow/issues/29389
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1419158114
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ut0mt8 commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
Posted by "ut0mt8 (via GitHub)" <gi...@apache.org>.
ut0mt8 commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1442012904
@potiuk did that mean that is an know bug? it's very very annoying and make the kube executor quasi unusable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jedcunningham commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
Posted by "jedcunningham (via GitHub)" <gi...@apache.org>.
jedcunningham commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1472836436
Hi @ut0mt8,
I naively took your DAG and plopped it into my local env with this config:
```
$ cat 29389.yaml
dags:
persistence:
enabled: true
existingClaim: {my dags mount}
executor: KubernetesExecutor
config:
scheduler:
job_heartbeat_sec: 60
schedule_after_task_execution: False
scheduler_health_check_threshold: 30
scheduler_heartbeat_sec: 10
```
(and just to record it here for the future, the latest OSS chart uses 2.5.1 as of today).
Unfortunately, I wasn't able to produce your issue. I'm not really sure what might be causing it, but could you put together a full example so we can reproduce this ourselves? DAG, Dockerfile (it'd make getting the dag in easier, and more consistent between our envs), helm values files?
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1442293373
I have no idea, but I doubt. I added "needs-triage" and possibly someone will take a look at it. I think adding milestone was just a mistake.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org