You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/02 11:06:59 UTC

[GitHub] [airflow] zacharya19 edited a comment on issue #13364: Add virtual env creation in Celery workers

zacharya19 edited a comment on issue #13364:
URL: https://github.com/apache/airflow/issues/13364#issuecomment-753322802


   > Can you please describe in detail the algorithm you want to propose for upgrading/managing the venv?
   
   As a first step, I suggest a simple solution:
   When the celery worker is getting a new task to run, it will extract the venv config from `executor_config` (`venv_name: str` and `packages: List[str]`) and if it exists it will first run a simple flock based on a file (`/tmp/{venv_name}`), run virtualenv creation if doesn't exists (which will include system-site-packages/copy another venv as discussed) and then run `pip install --upgrade` for all packages (always, which will slow it a bit) and release the lock.
   After that - activate the venv and same process as usual (`airflow run ...`).
   
   This solution is fairly simple to implement and covers the base issues I described in the issue.
   As for other tasks overriding versions and messing around - I think that as a first step for this feature it's the user's problem (same as Airflow doesn't validate all workers running with the same Airflow version) and we can maybe provide a warning to let them know, the user can make sure to either pin the same version in all of the tasks (maybe by using Airflow variables), or always use latest.
   I understand that this puts a lot of the user to make it work, but it's an easy solution without hard work.
   
   > so it's likely better to solve them as 'general' case.
   
   I'm all down to start thinking on a generic solution, but I don't think that this feature has to be bulletproof, so if you still think my solution isn't good enough I can try think on something else or maybe propose a solution for a generic caching manager.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org