You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/18 07:37:26 UTC

[GitHub] [airflow] ArshiAAkhavan opened a new issue #17678: log aggregation from kubernetesExecutor using kubernetesAPI

ArshiAAkhavan opened a new issue #17678:
URL: https://github.com/apache/airflow/issues/17678


   **Description**
   there are three main ways for storing logs in airflow in `k8s` including shared `persistVolume`, `s3` object storage, and `elasticsearch`
   putting shared volume aside, the other two haven't been implemented nicely when using `kubernetesExecutor` as your executor in airflow
   the current flow (i have tested the elastic way so i explain it with `elasticsearch`):
   - for each task `kubernetesExecutor` creates a pod acting as the hosting worker node
   - task logs are persisted either on the pods local disk or stored in containers log via `elasticsearch.write_json=true` in `airflow.cfg` file
   - from there a log aggregator service (in my case `filebeats`+`logstash`) reads from the log files and sends it to elastic search
   
   the solution above can be improved because as of now
   - you either have logs persisted on your containers local disk so you need a `filebeat`/`logstash` container next to each worker container with shared volume mounted on log directory
   - or you have used `elasticsearch.write_json=true` config which enables you to read from log file generated by `k8s` (usually in `/var/log/container/*.log`) via a `daemonset` of `filebeat` which is responsible for log aggregation for each node
   
   although both of the solutions work, there is still a waste of resources in both solutions
   
   meanwhile we have the `kubernetesPodOperator` that does the fascinating job of retrieving logs from the task-pods stdout via kubernetes APIs by default!
   
   i was thinking that combining these two features (`elasticsearch.write_stdout=true` and `kubernetesPodOperator` default behavior) we are able to send logs from worker pods to the scheduler directly and have them stored in the scheduler pod instead
    
   **Use case / motivation**
   well first of all in case you are deploying your scheduler and webserver service outside of your `k8s` cluster, thats the end of the road for you since you have your logs stored in a disk both visible from web-server and scheduler
   
   if you are deploying your scheduler and webserver on `k8s` (which is a common practice) then you still need the `logstash`/`filebeat` service to send logs to your `elasticsearch` instance but this time you wont be needing a whole `deamonset` or one instance per worker pod , just one per each scheduler pod would suffice which is much less recourse usage (in my case i have only one scheduler pod so its only 1!)
   
   **What do you want to happen? **
   the whole process of remote logging to `elasticsearch` is so hard compare to other parts of deploying airflow when using `kubernetesExecutor` and i am trying to ease up the process
   
   also i feel like its more k8s-ish way to do!!
   
   **Are you willing to submit a PR?**
   if pointed to the right directions to look at, yes!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17678: log aggregation from kubernetesExecutor using kubernetesAPI

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17678:
URL: https://github.com/apache/airflow/issues/17678#issuecomment-900891049


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17678: log aggregation from kubernetesExecutor using kubernetesAPI

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17678:
URL: https://github.com/apache/airflow/issues/17678#issuecomment-900891049


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org