You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/27 20:21:47 UTC

[GitHub] [airflow] derkuci commented on pull request #19157: fix "mismatch process ids" issue for the "run as user" case

derkuci commented on pull request #19157:
URL: https://github.com/apache/airflow/pull/19157#issuecomment-1001737855


   I have the same problem when upgrading from 1.9 to 2.2.  I couldn't figure out a fix and have to give up all the use of "run_as_user" (had to rewrite/rearrange the tasks).  That's a shame.
   
   I tried to go through the airflow code, but with very limited knowledge about its architect assumptions, I didn't progress much.  All I can guess is that there's inconsistency between how a task identify itself and how it communicates (heartbeat) with the scheduler/db/whatever.  Your proposed code change seems a good start, but don't fully resolve the inconsistency.
   
   For example, with Celery, the process hierarchy with run_as_user is
   ```
      celery worker process
        \-- (forked) task process
              \-- sudo process
   ```
   I've seen in `LocalTaskJob.heartbeat_callback()`, `ti.pid=<task process pid>` and `current_pid=<sudo process pid>`.  The comparison is actually between `ti.pid's ppid` i.e. `<celery worker process pid>` versus `<sudo process pid>` which I couldn't understand at all.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org