You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/31 15:57:22 UTC

[GitHub] [airflow] zachary-naylor opened a new issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

zachary-naylor opened a new issue #15116:
URL: https://github.com/apache/airflow/issues/15116


   **Apache Airflow version**: 2.0.1
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): N/A
   
   **Environment**:
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release): Ubuntu WSL 20.04.2 LTS / Windows 10 19041.844
   - **Kernel** (e.g. `uname -a`):  4.4.0-19041-Microsoft x86_64 GNU/Linux
   - **Install tools**:
   - **Others**: Local debug via WSL1 backed Docker Engine v20.10.5 / docker-compose 1.28.4 (running apache/airflow:2.0.1-python3.8)
   
   **What happened**: 
   With the ```AIRFLOW__CELERY__POOL``` set to 'eventlet', logs are neither written nor retrieved from the defined S3 remote bucket (remote logging enabled and the 'aws_default' connection exists). 
   
   This produces both of these error messages in the worker console. 
   - Could not verify previous log to append: maximum recursion depth exceeded while calling a Python object
   - Could not write logs to s3://####/logs/airflow2/trial_dag_airflow2/gen_extract_trial/2021-03-31T13:38:24.330747+00:00/1.log 
   ```
   [2021-03-31 13:38:26,934: INFO/MainProcess] Airflow Connection: aws_conn_id=aws_default
   [2021-03-31 13:38:26,944: INFO/MainProcess] No credentials retrieved from Connection
   [2021-03-31 13:38:26,944: INFO/MainProcess] Creating session with aws_access_key_id=None region_name=None
   [2021-03-31 13:38:26,962: INFO/MainProcess] role_arn is None
   [2021-03-31 13:38:26,963: ERROR/MainProcess] Could not write logs to s3://####/logs/airflow2/trial_dag_airflow2/gen_extract_trial/2021-03-31T13:38:24.330747+00:00/1.log
   Traceback (most recent call last):
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/log/s3_task_handler.py", line 186, in s3_write
   self.hook.load_string(
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 61, in wrapper
   return func(*bound_args.args, **bound_args.kwargs)
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 90, in wrapper
   return func(*bound_args.args, **bound_args.kwargs)
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 563, in load_string
   self._upload_file_obj(file_obj, key, bucket_name, replace, encrypt, acl_policy)
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/s3.py", line 653, in _upload_file_obj
   client = self.get_conn()
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 455, in get_conn
   return self.conn
   File "/home/airflow/.local/lib/python3.8/site-packages/cached_property.py", line 36, in __get__
   value = obj.__dict__[self.func.__name__] = self.func(obj)
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 437, in conn
   return self.get_client_type(self.client_type, region_name=self.region_name)
   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/amazon/aws/hooks/base_aws.py", line 410, in get_client_type
   return session.client(client_type, endpoint_url=endpoint_url, config=config, verify=self.verify)
   File "/home/airflow/.local/lib/python3.8/site-packages/boto3/session.py", line 258, in client
   return self._session.create_client(
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/session.py", line 826, in create_client
   credentials = self.get_credentials()
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/session.py", line 430, in get_credentials
   self._credentials = self._components.get_component(
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/session.py", line 924, in get_component
   self._components[name] = factory()
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/session.py", line 151, in _create_credential_resolver
   return botocore.credentials.create_credential_resolver(
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 72, in create_credential_resolver
   container_provider = ContainerProvider()
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/credentials.py", line 1817, in __init__
   fetcher = ContainerMetadataFetcher()
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/utils.py", line 1976, in __init__
   session = botocore.httpsession.URLLib3Session(
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/httpsession.py", line 180, in __init__
   self._manager = PoolManager(**self._get_pool_manager_kwargs())
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/httpsession.py", line 188, in _get_pool_manager_kwargs
   'ssl_context': self._get_ssl_context(),
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/httpsession.py", line 197, in _get_ssl_context
   return create_urllib3_context()
   File "/home/airflow/.local/lib/python3.8/site-packages/botocore/httpsession.py", line 72, in create_urllib3_context
   context.options |= options
   File "/usr/local/lib/python3.8/ssl.py", line 602, in options
   super(SSLContext, SSLContext).options.__set__(self, value)
   File "/usr/local/lib/python3.8/ssl.py", line 602, in options
   super(SSLContext, SSLContext).options.__set__(self, value)
   File "/usr/local/lib/python3.8/ssl.py", line 602, in options
   super(SSLContext, SSLContext).options.__set__(self, value)
   [Previous line repeated 456 more times]
   RecursionError: maximum recursion depth exceeded while calling a Python object
   ````
   Recursion errors are recorded in the AF UI logs in addition to an entry of Falling back to local log. When run through AWS ECS, no logs are accessible within the UI.
   ```
   *** Falling back to local log
   *** Reading local file: /opt/airflow/logs/trial_dag_airflow2/gen_extract_trial/2021-03-31T13:38:27.146858+00:00/1.log
   ```
   
   **What you expected to happen**: 
   Logs should be written to the defined S3 bucket using credentials within the mounted /home/airflow/.aws directory or inherited from the ECS task operator without recursion errors.
   
   When 'prefork' is used, it appears an attempt is made to locate credentials within the mounted volume:
   [2021-03-31 14:41:22,863: INFO/ForkPoolWorker-15] Found credentials in shared credentials file: ~/.aws/credentials
   
   **How to reproduce it**:
   * Set and export AWS_PROFILE var (```AWS_PROFILE="dev_access" \ export AWS_PROFILE```)
   * Use the docker-compose setup (https://github.com/apache/airflow/blob/master/docs/apache-airflow/start/docker-compose.yaml)
   * Append or update docker-compose [x-airflow-common] with the following:
   ```
   image: apache/airflow:2.0.1-python3.8
   environment:
     AIRFLOW__CORE__LOAD_EXAMPLES: 'False'
     AIRFLOW__CELERY__POOL: 'eventlet'
     AIRFLOW__LOGGING__REMOTE_LOGGING: 'True'
     AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: 'aws_default'
     AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: 's3://####/logs/airflow2'
     AWS_PROFILE: ${AWS_PROFILE}
   volumes:
     -  /c/Users/user.name/.aws:/home/airflow/.aws
   ```
   * Start docker-compose container. When started, ensure the follow connection exists: 
   ```
   * conn_id: aws_default
   * conn_type: aws
   ```
   * Two test DAGs were used. The first is comprised of a simple DAG with a PythonOperator to a function with a print statement. The second DAG is identical but with the following code embedded within the function instead of a print statement.
   ```
       import boto3
   
       s3 = boto3.client("s3")
       s3.put_object(
           Bucket="####",
           Body="Test content from Airflow2",
           Key=f"var/airflow2/test_log_file.txt",
       )
   ```
   * Running either of the two DAGs results in logs unable to be read/written to S3. With the 2nd DAG, recursion errors are encountered in the local log.
   
   **Anything else we need to know**: 
   This has been reproduced with both ```apache/airflow:2.0.1-python3.8``` and ```apache/airflow:2.0.0-python3.8``` docker images. I have not trialed with alternative Python versions.
   
   When the AIRFLOW__CELERY__POOL is set to prefork or solo, no recursion errors are encountered with logs readable/writeable to S3.
   
   Running the same test, but with the docker images [```apache/airflow:1.10.12-python3.8``` ```apache/airflow:1.10.14-python3.8``` ```apache/airflow:1.10.15-python3.8```], amended variables set and Airflow 1x compatible DAGs is successful in reading/writing to S3 without recursion errors.
   ```
   environment:
     AIRFLOW__CORE__LOAD_EXAMPLES: 'False'
     AIRFLOW__CELERY__POOL: 'eventlet'
     AIRFLOW__CORE__REMOTE_LOGGING: 'True'
     AIRFLOW__CORE__REMOTE_LOG_CONN_ID: 'aws_default'
     AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: 's3://####/logs/airflow'
     AWS_PROFILE: ${AWS_PROFILE}
   volumes:
     -  /c/Users/user.name/.aws:/home/airflow/.aws
   ```
   Upscaling eventlet from 30.0.1 > 30.0.2 has no effect. Conversely downgrading to gevent 1.5.0 and eventlet 0.25.2 have had no effect.
   
   Adding airflow_local_settings.py to the ${AIRFLOW_HOME} directory within the Docker image, with variations of the following code at the top (following suggestions) has had no effect.
   ```
   import eventlet.debug
   eventlet.monkey_patch()
   ```
   ```
   from gevent import monkey
   monkey.patch_ssl()
   ```
   ```
   from gevent import monkey
   monkey.patch_all()
   ```
   This may be related to https://github.com/apache/airflow/issues/8212


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811755486


   Given how long https://github.com/pallets/flask PR 3412 has been open (not link to needlessly associate the PRs) you may be right


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zachary-naylor commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
zachary-naylor commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811247340


   gevent == 21.1.2
   I noted this is the same version used within the ```apache/airflow:1.10.15-python3.8``` docker image


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811258624


   As a hack, maybe it would be possible to call something on the hook before gevent patches? The connection object (creation of which triggers `SSLContext` modifications) is cached, so once it’s created, subsequent S3 calls won’t modify `SSLContext` (I think).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811450727


   Long term the "fix" for us is probably to wait for Flask to support asyncio and then to drop support for gevent/eventlet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811254578


   Seems like this is a known gevent issue: gevent/gevent#1777 and gevent/gevent#1531. Note that both issues have been closed without a fix.
   
   Reading the issues, it seems like this happens when the monkeypatch happens before an option update, so a fix in Airflow would need to be delaying the patch until all possible context modifications. But since those modifications come from providers, it can literally happen anywhere, so unfortunately I don't see a good way to fix this in Airflow. Hopefully someone can provide a workable solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zachary-naylor commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
zachary-naylor commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-815839510


   It would appear they heard you; Flask's PR 3412 has since been merged with the asyncio support.
   
   I have been trialing several env vars and setting ```AIRFLOW__CORE__EXECUTE_TASKS_NEW_PYTHON_INTERPRETER: 'True'``` results in the recursion errors disappearing with eventlet worker tasks executing; suggesting an issue around the ```celery_executor._execute_in_fork``` function.
   
   However, whilst eventlet workers are now running the tasks without recursion errors, the tasks are executed sequentially and not asynchronously - defeating the purpose of using eventlet in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811710907


   > Flask to support asyncio
   
   Personally I suspect that would ever happen, to be honest. But that’s a whole other issue…


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811240381


   What is the gevent version you're using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15116: Logging to S3 fails using eventlet workers (maximum recursion depth exceeded)

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15116:
URL: https://github.com/apache/airflow/issues/15116#issuecomment-811178383


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org