You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/11 15:16:13 UTC

[GitHub] [airflow] scaoupgrade commented on issue #8212: Can't read S3 remote logs when using gevent/eventlent webserver workers.

scaoupgrade commented on issue #8212:
URL: https://github.com/apache/airflow/issues/8212#issuecomment-796810511


   > > Gevent is not something Airflow uses though - and airflow_local_settings is a user controlled file.
   > 
   > Based on my very brief look gevent is currently used just as alternative for webserver request handling (so its true that this entire issue is only about reading s3 logs in webserver component, the rest: scheduler, or airflow workers if there are any - shouldn't anyhow be touched - are working correctly, and this monkeypatching actually can break them instead).
   > About airflow_local_settings - I think its not true because of https://github.com/apache/airflow/blob/1.10.14/airflow/settings.py#L416. Attention - it doesn't matter that we may not call methods in airflow_local_settings, what matters is package import order where gevent monkeypatching should be first call in entrypoint or as soon as possible (in this case, it should be before botocore and requests imports).
   > 
   > What I mentioned - is more a "dirty quick" that I found via reading stack and bruteforcing, since I have only small picture of how airflow start flow of Webserver process goes (especially, when Gevent Worker forking is impacting on result). Only dev who has clear understanding where everything begins for `forked` gevent worker might understand where to include this monkeypatching, plus should also have to ensure that its happening **only for gevent**, as such action for other types of webserver engine would cause issues instead (sync, eventlet).
   > 
   > PS: in any case, since issue is present, and it might be critical for infrastructure handling - its unreasonable to have this closed independent of how difficult it is to fix it. (its not difficult, just requires some thorough look)
   
   We are using celery worker for airflow and s3 logging is broken. it seems to be celery's gevent patching that has caused the issue: https://github.com/celery/celery/blob/6eb5b718843d69e31bb2c90e3efa2e2aa39f5f94/celery/__init__.py#L114


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org