You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/16 21:17:03 UTC

[GitHub] [airflow] Archit-Shah opened a new issue #21623: Stackdriver Remote Logging - Only partial log content captured on Google Cloud Logging

Archit-Shah opened a new issue #21623:
URL: https://github.com/apache/airflow/issues/21623


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   I am trying to configure **remote logging** with StackDriver on my local Airflow machine setup (Official Docker Compose). We have updated following config variables in our setup (output from `airflow config list | grep remote`):
   
   ```
   remote_logging = True
   remote_log_conn_id = google_cloud_default
   remote_base_log_folder = stackdriver://airflow-tasks
   ```
   
   The GCP connection has been setup properly on Airflow. The service account has the correct set of permissions to read/write logs.
   
   We are noticing that remote logger is only capturing partial logs. Here's an example log:
   
   ```
   [2022-02-16 18:51:43,440] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: test_dag.hello_world manual__2022-02-16T18:51:42.870907+00:00 [queued]>
   [2022-02-16 18:51:43,464] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: test_dag.hello_world manual__2022-02-16T18:51:42.870907+00:00 [queued]>
   [2022-02-16 18:51:43,464] {taskinstance.py:1238} INFO - --------------------------------------------------------------------------------
   [2022-02-16 18:51:43,465] {taskinstance.py:1239} INFO - Starting attempt 1 of 1
   [2022-02-16 18:51:43,465] {taskinstance.py:1240} INFO - --------------------------------------------------------------------------------
   [2022-02-16 18:51:43,477] {taskinstance.py:1259} INFO - Executing <Task(BashOperator): hello_world> on 2022-02-16 18:51:42.870907+00:00
   [2022-02-16 18:51:43,481] {standard_task_runner.py:52} INFO - Started process 267 to run task
   [2022-02-16 18:51:48,522] {local_task_job.py:211} WARNING - State of this instance has been externally set to success. Terminating instance.
   [2022-02-16 18:51:48,525] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 267. PIDs of all processes in the group: [267]
   [2022-02-16 18:51:48,525] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 267
   [2022-02-16 18:51:48,539] {process_utils.py:70} INFO - Process psutil.Process(pid=267, status='terminated', exitcode=0, started='18:51:42') (267) terminated with exit code 0
   ```
   
   These logs are for a simple hello world bash task, code snippet below:
   
   ```
   hello_world=BashOperator(
           task_id='hello_world',
           bash_command='echo "Hello World"'
       )
   ```
   
   
   
   
   ### What you expected to happen
   
   I would expect the StackDriver remote logger on Airflow to capture full logs for the DAG. 
   
   When I run the same DAG (from above in What happened section) with remote logging turned off, here's the logs I see as expected:
   
   ```
   [2022-02-16 18:35:36,814] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: test_dag.hello_world manual__2022-02-16T18:35:35.078685+00:00 [queued]>
   [2022-02-16 18:35:36,824] {taskinstance.py:1032} INFO - Dependencies all met for <TaskInstance: test_dag.hello_world manual__2022-02-16T18:35:35.078685+00:00 [queued]>
   [2022-02-16 18:35:36,825] {taskinstance.py:1238} INFO - 
   --------------------------------------------------------------------------------
   [2022-02-16 18:35:36,826] {taskinstance.py:1239} INFO - Starting attempt 1 of 1
   [2022-02-16 18:35:36,827] {taskinstance.py:1240} INFO - 
   --------------------------------------------------------------------------------
   [2022-02-16 18:35:36,835] {taskinstance.py:1259} INFO - Executing <Task(BashOperator): hello_world> on 2022-02-16 18:35:35.078685+00:00
   [2022-02-16 18:35:36,839] {standard_task_runner.py:52} INFO - Started process 913 to run task
   [2022-02-16 18:35:36,842] {standard_task_runner.py:76} INFO - Running: ['***', 'tasks', 'run', 'test_dag', 'hello_world', 'manual__2022-02-16T18:35:35.078685+00:00', '--job-id', '5', '--raw', '--subdir', 'DAGS_FOLDER/test.py', '--cfg-path', '/tmp/tmp27iyimcv', '--error-file', '/tmp/tmpcxitey8_']
   [2022-02-16 18:35:36,843] {standard_task_runner.py:77} INFO - Job 5: Subtask hello_world
   [2022-02-16 18:35:36,901] {logging_mixin.py:109} INFO - Running <TaskInstance: test_dag.hello_world manual__2022-02-16T18:35:35.078685+00:00 [running]> on host 08bb57e5bcff
   [2022-02-16 18:35:36,945] {taskinstance.py:1424} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=***
   AIRFLOW_CTX_DAG_ID=test_dag
   AIRFLOW_CTX_TASK_ID=hello_world
   AIRFLOW_CTX_EXECUTION_DATE=2022-02-16T18:35:35.078685+00:00
   AIRFLOW_CTX_DAG_RUN_ID=manual__2022-02-16T18:35:35.078685+00:00
   [2022-02-16 18:35:36,947] {subprocess.py:62} INFO - Tmp dir root location: 
    /tmp
   [2022-02-16 18:35:36,948] {subprocess.py:74} INFO - Running command: ['bash', '-c', 'echo Hello World']
   [2022-02-16 18:35:36,957] {subprocess.py:85} INFO - Output:
   [2022-02-16 18:35:36,958] {subprocess.py:89} INFO - Hello World
   [2022-02-16 18:35:36,959] {subprocess.py:93} INFO - Command exited with return code 0
   [2022-02-16 18:35:36,980] {taskinstance.py:1267} INFO - Marking task as SUCCESS. dag_id=test_dag, task_id=hello_world, execution_date=20220216T183535, start_date=20220216T183536, end_date=20220216T183536
   [2022-02-16 18:35:37,015] {local_task_job.py:154} INFO - Task exited with return code 0
   [2022-02-16 18:35:37,039] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
   ```
   
   ### How to reproduce
   
   1. Download the latest docker-compose.yaml file from https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml
   2.  Update the following environment variables in the docker-compose file:
       - AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: 'google_cloud_default'
       - AIRFLOW__LOGGING__REMOTE_LOGGING: 'true'
       - AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: 'stackdriver://airflow-tasks'
   3. Make sure google_cloud_default connection is setup properly on the UI
   4. Run any example DAG
   5. Go to Google Cloud Platform Console > Cloud Logging > Logs Explorer
   6. Search for `<name_of_the_dag> resource.type="global"`
   
   ### Operating System
   
   macOS Monterey Version 12.2
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==6.2.0
   apache-airflow-providers-grpc==2.0.1
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   I am using official docker compose file from https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml and just adding necessary environment variables 
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21623: Stackdriver Remote Logging - Only partial log content captured on Google Cloud Logging

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21623:
URL: https://github.com/apache/airflow/issues/21623#issuecomment-1042318492


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org