You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ut0mt8 (via GitHub)" <gi...@apache.org> on 2023/02/06 14:20:08 UTC

[GitHub] [airflow] ut0mt8 opened a new issue, #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

ut0mt8 opened a new issue, #29389:
URL: https://github.com/apache/airflow/issues/29389

   ### Apache Airflow version
   
   2.5.1
   
   ### What happened
   
   I think this issue have been already discussed in many issue but I'm still in the case where I basically cannot use the kubernetes Executor...
   
   with this config: 
   
   `````
      executor = KubernetesExecutor
   
   [scheduler]
       job_heartbeat_sec = 60
       schedule_after_task_execution = False
       scheduler_health_check_threshold = 30
       scheduler_heartbeat_sec = 10
   `````
   
   and this very simple Dag
   `````
   import datetime
   
   from airflow import DAG
   from airflow.operators.python_operator import PythonOperator
       
   
   def sigterm_debug(ds, **kwargs):
       run_date = str(ds)
       print("run date is : " + str(run_date))
       import time
       time.sleep(500)
       print("yeaah not killed ...")
   
   
   dag = DAG(
       dag_id="debug",
       description="Debug",
       schedule_interval="30 21,9 * * *",
       start_date=datetime.datetime(2023, 1, 1),
       catchup=False,
       dagrun_timeout=datetime.timedelta(hours=3)
   )
   
   
   sigterm_debug_task = PythonOperator(task_id='sigterm_debug_task', 
                                       python_callable=sigterm_debug,
                                       execution_timeout=datetime.timedelta(hours=1),
                                       retries=2, 
                                       dag=dag)
   
   
   sigterm_debug_task
   `````
   
   the Dag is killed by the scheduler raising the infamous exception: "PID of job runner does not match" at the exact time the job_heartbeat_interval run
   
   Note: the pod by itself is fine. This is really airflow that kill the task and then the pod exit with an exit code > 0...
   
   Note2: increasing "job_heartbeat_sec" to something very high let the DAG finish successfully but then the pods is never ripped before the job_heartbeat_sec arise ...
   
   
   
   
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   Everything should be working as expected :) 
   
   ### Operating System
   
   Docker/Kubernetes
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   100% reproducible with this config.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ut0mt8 commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

Posted by "ut0mt8 (via GitHub)" <gi...@apache.org>.
ut0mt8 commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1443784093

   btw I ended monkey patching airflow
   
   https://github.com/apache/airflow/blob/2cd12fc86950646d10dfb5fb7f9c76c529715b46/airflow/jobs/local_task_job.py#L270
   
   removing this line...
   works for me.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal closed issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"
URL: https://github.com/apache/airflow/issues/29389


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1419158114

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ut0mt8 commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

Posted by "ut0mt8 (via GitHub)" <gi...@apache.org>.
ut0mt8 commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1442012904

   @potiuk did that mean that is an know bug? it's very very annoying and make the kube executor quasi unusable. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] jedcunningham commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

Posted by "jedcunningham (via GitHub)" <gi...@apache.org>.
jedcunningham commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1472836436

   Hi @ut0mt8,
   
   I naively took your DAG and plopped it into my local env with this config:
   
   ```
   $ cat 29389.yaml                    
   dags:
     persistence:
       enabled: true
       existingClaim: {my dags mount}
                                    
   executor: KubernetesExecutor
   
   config:
     scheduler:
       job_heartbeat_sec: 60
       schedule_after_task_execution: False
       scheduler_health_check_threshold: 30
       scheduler_heartbeat_sec: 10
   ```
   
   (and just to record it here for the future, the latest OSS chart uses 2.5.1 as of today).
   
   Unfortunately, I wasn't able to produce your issue. I'm not really sure what might be causing it, but could you put together a full example so we can reproduce this ourselves? DAG, Dockerfile (it'd make getting the dag in easier, and more consistent between our envs), helm values files?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29389: Dag with kubernetes executor are being killed with "PID of job runner does not match"

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29389:
URL: https://github.com/apache/airflow/issues/29389#issuecomment-1442293373

   I have no idea, but I doubt. I added "needs-triage" and possibly someone will take a look at it. I think adding milestone was just a mistake.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org