You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/17 02:37:12 UTC

[GitHub] [airflow] kakarukeys opened a new issue #15415: S3 remote logging not working for airflow server components

kakarukeys opened a new issue #15415:
URL: https://github.com/apache/airflow/issues/15415


   **Apache Airflow version**: 2.0.1
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: on my laptop
   - **OS** (e.g. from /etc/os-release): MacOS Majave 10.14.6
   - **Kernel** (e.g. `uname -a`): Darwin Wongs-MBP 18.7.0 Darwin Kernel Version 18.7.0: Tue Jan 12 22:04:47 PST 2021; root:xnu-4903.278.56~1/RELEASE_X86_64 x86_64
   
   **What happened**:
   
   configured remote logging to S3 bucket, only the logs of DAG runs appeared in the bucket.
   logs of airflow server components: scheduler, web server, etc did appear
   
   **What you expected to happen**:
   
   all logs go to S3 bucket
   
   **How to reproduce it**:
   
   1. follow the quick start guide in https://airflow.apache.org/docs/apache-airflow/stable/start/local.html
   
   2. before starting web server set the following variables:
   
   ```sh
   export AIRFLOW__LOGGING__REMOTE_LOGGING=True
   export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://my-bucket/
   export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=my_remote_logging_conn_id
   ```
   
   3. start the web server and set your S3 connection settings in the web server "connections" section.
   
   ```
   Conn Id * my_remote_logging_conn_id
   Conn Type  S3
   Extra {"region_name": "nyc3",
    "host": "https://nyc3.digitaloceanspaces.com",
    "aws_access_key_id": "xxx",
    "aws_secret_access_key": "xxx"}
   ```
   
   4. Restart the web server
   5. Start the scheduler in another console window (setting the same env variables)
   6. Execute a DAG
   7. Head to your S3 bucket UI, you will see only logs of DAG runs appear.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kakarukeys commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
kakarukeys commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822093282


   @xinbinhuang  the "correct behavior" you mentioned was not documented anywhere in the doc. Even the [Logging Architecture](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-architecture.html#:~:text=By%20default%2C%20Airflow%20supports%20logging,environments%20and%20for%20quick%20debugging) picture shows logging, remote or not, apply to all components. See where the lines above "Logging" connect to?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kakarukeys commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
kakarukeys commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822917103


   Since this is not classified as a bug, I will close this. Thanks for your help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822121691


   Please refer to the most up-to-date documentation instead. There is not much point to reference old documentation to a new version, especially across major releases (i.e. 1.10.2 -> 2.0.*). 
   
   Other people would probably have more context, but I believe that *remote logging* has been always only applying to *task instance logs* (#5175), and at some point, the old documentation was updated to reflect the fact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822864528


   Airflow handles some logs in a special way and unfortunately, it is not easy to unify this as our logs have different characteristics. 
   
   **Logs for tasks** are saved to files because they are small and we can upload them after completing the tasks. When they are sent to object storage it is much easier to read to them, but each time you add a new line, we need to get all the contents and upload a new file with the full log contents. Object storages have limited support for appending operations.
   On the other hand, logs for other components are an endless stream of data that cannot be stopped to send the content. For this reason, conventional tools fit better because they are optimized to handle such operations. 
   
   I don't think we can fix this problem, but we can update the documentation to better describe these project assumption.
   
   Related issue: https://github.com/apache/airflow/issues/10593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822098377


   See below the part inside the red circle:
   ![image](https://user-images.githubusercontent.com/27927454/115167535-e000e800-a06c-11eb-8032-d3366777a175.png)
   
   Regardless, I agree with you that _remote logging applies to only tasks_ may not obvious enough in the documentation. Would you like to submit a PR to improve the documentation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-821752982


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kakarukeys closed issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
kakarukeys closed issue #15415:
URL: https://github.com/apache/airflow/issues/15415


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kakarukeys commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
kakarukeys commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822106041


   I was following this doc: https://airflow.apache.org/docs/apache-airflow/1.10.2/howto/write-logs.html
   
   Since I opine that it is a questionable design decision to have some logs writing to one place, mostly ephemeral storage, and other logs to other place, under one logging config, I think I will decline that. 
   
   I can look into changing the code instead, if everyone agrees it is a bug.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822043491


   Hi @kakarukeys , This is the correct behavior: currently, remote logging only sends `task logs` to remote storage. If you need to configure other logging behaviors, e.g. scheduler, webserver, you can create a custom logging config for use with your deployment. (see [Logging for Tasks - Advanced Confgiuration](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#advanced-configuration))
   
   Another production deployment option is to use FluentD to capture logs and send them to destinations such as ElasticSearch or Splunk. (ref [Logging Architecture](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-architecture.html#:~:text=By%20default%2C%20Airflow%20supports%20logging,environments%20and%20for%20quick%20debugging.))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kakarukeys commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
kakarukeys commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822190523


   ah my bad habit of treating first link of Google as source of truth.
   
   On the first read of the 2.0.1 doc, I can see [Logging and Monitoring architecture page](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-architecture.html) is still misleading.
   
   [diagram](https://ibb.co/hgDV5Vb)
   
   > By default, Airflow supports logging into the local file system. These include logs from the Web server, the Scheduler, and the Workers running tasks. This is suitable for development environments and for quick debugging.
   > For cloud deployments, Airflow also has handlers contributed by the Community for logging to cloud storage such as AWS, Google Cloud, and Azure.
   
   "Logging for Tasks" is lacking any examples for different platforms.
   "Logging for Others" can't be found.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822098377


   See below the part inside the red circle:
   ![image](https://user-images.githubusercontent.com/27927454/115167535-e000e800-a06c-11eb-8032-d3366777a175.png)
   
   Regardless, I agree with you that _remote logging applies to only tasks_ may not obvious enough in the documentation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822121691


   Please refer to the most up-to-date documentation instead. There is not much point to reference old documentation to a new version, especially across major releases (i.e. 1.10.2 -> 2.0.*). 
   
   That being said, I believe that *remote logging* has been always only applying to *task instance logs* (#5175), and at some point, the old documentation was updated to reflect the fact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang edited a comment on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang edited a comment on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822098377


   See below the part inside the red circle:
   ![image](https://user-images.githubusercontent.com/27927454/115167535-e000e800-a06c-11eb-8032-d3366777a175.png)
   
   Regardless, I agree with you that _remote logging applies to only tasks_ may not obvious enough in the documentation. Would you like to submit a PR to improve the documentation :)  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] xinbinhuang commented on issue #15415: S3 remote logging not working for airflow server components

Posted by GitBox <gi...@apache.org>.
xinbinhuang commented on issue #15415:
URL: https://github.com/apache/airflow/issues/15415#issuecomment-822043491


   This is the correct behavior: currently, remote logging only sends `task logs` to remote storage. If you need to configure other logging behaviors, e.g. scheduler, webserver, you can create a custom logging config for use with your deployment. (see [Logging for Tasks - Advanced Confgiuration](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#advanced-configuration))
   
   Another production deployment option is to use FluentD to capture logs and send them to destinations such as ElasticSearch or Splunk. (ref [Logging Architecture](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-architecture.html#:~:text=By%20default%2C%20Airflow%20supports%20logging,environments%20and%20for%20quick%20debugging.))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org