You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/14 12:31:34 UTC

[GitHub] [airflow] datsabk opened a new issue #18239: Airflow Celery Worker logs inaccessible

datsabk opened a new issue #18239:
URL: https://github.com/apache/airflow/issues/18239


   ### Apache Airflow version
   
   2.1.3 (latest released)
   
   ### Operating System
   
   Python 3.6 Apache Airflow Docker
   
   ### Versions of Apache Airflow Providers
   
   2.1.3
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Kubernetes based deployment - Workers and Master in Kubernetes as pods. Logs accessed via NodePort Service
   
   ### What happened
   
   *** Log file does not exist: /xxx/airflow/home/logs/xxxx/2021-09-14T12:13:32.383510+00:00/1.log
   *** Fetching from: http://tsc-aflow-orca:8793/log/xxxx/2021-09-14T12:13:32.383510+00:00/1.log
   *** Failed to fetch log file from worker. 503 Server Error: Service Unavailable for url: http://tsc-aflow-orca:8793/log/xxxx/2021-09-14T12:13:32.383510+00:00/1.log
   For more information check: https://httpstatuses.com/503
   
   Checked the Worker pod - Logs exist. However, it seems like the Worker web service is unable to access the logs. 
   
   ### What you expected to happen
   
   Worker logs worked fine with the same setup but older version v1.10.12
   
   ### How to reproduce
   
   Run a sample Dag with Webserver, Scheduler and one Worker running in Kubernetes - No custom setup. 
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] pvanliefland commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
pvanliefland commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-944256151


   I'm using the helm chart and experiencing something similar. When using `KubernetesExecutor` and `logs.persistence.existingClaim`, it works fine. As soon as I switch to `CeleryExecutor` (without touching `logs.persistence.existingClaim`), I also get `Log file does not exist`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] changxiaoju commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
changxiaoju commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-1046765625


   > In airflow.cfg set this
   > 
   > ```
   > hostname_callable = airflow.utils.get_host_ip_address
   > ```
   > 
   > If still doesn't work, ensure worker containers are exposing port 8793 in kubernetes template.
   > 
   > Log existence check failed because webserver probably tries to find logs locally, but they're probably stored on worker. It does fallback to logic of retrieval of those logs from worker containers using REST.
   
   Hi dimon222, i set `hostname_callable = airflow.utils.get_host_ip_address` and it did work for several times and then shut down again, what can i do next, and by the way how to "ensure worker containers are exposing port 8793 in kubernetes template". Really appreciate your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-1046874586


   @changxiaoju 
   containerPort on container of airflow worker in property `ports`
   https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] datsabk commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
datsabk commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-922619410


   Hello @dimon222 - The workers are sitting behind a Kubernetes service. Hence, the logs need to be accessible via the service name. To support this, I have used a different hostname callable and gave it the DNS name of service. 
   
   I do not see how it could be a problem though. You can imagine the situation like: 
   
   master.abc.com -> Airflow master
   worker.abc.com -> Airflow workers (multiple behind a load balancer sharing logs storage) 
   
   I need logs to be accessible behind worker.abc.com instead of via IP address


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 edited a comment on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 edited a comment on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-921976922






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-1047109247


   > > @changxiaoju containerPort on container of airflow worker in property `ports` https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/
   > 
   > Thankyou, but i am not using CeleryExecutor , instead i use LocalExcutor, what may cause the error then?
   
   If error does include the port still in a same way as you see in first message, you could attempt expose that port on your scheduler. If not, I suspect you have some unrelated exception and should probably make separate ticket for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] changxiaoju commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
changxiaoju commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-1047027219


   > @changxiaoju containerPort on container of airflow worker in property `ports` https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/
   
   Thankyou, but i am not using CeleryExecutor , instead i use LocalExcutor,  what may cause the error then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-921976922


   In airflow.cfg set this
   ```
   hostname_callable = airflow.utils.get_host_ip_address
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 edited a comment on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 edited a comment on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-921976922


   In airflow.cfg set this
   ```
   hostname_callable = airflow.utils.get_host_ip_address
   ```
   
   Log existence check failed because webserver probably tries to find logs lcoally, but they're probably stored on worker.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 edited a comment on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 edited a comment on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-944827298


   > By running the workers behind a load balancer, you're removing the webserver's ability to specify which server the logs are stored on..    
   
   I would assume "shared logs" implies that storage is shared (mounted volume, PVC in kubernetes, etc). Unless there's something on worker itself restricts to go above what was allocated to this specific worker to do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-919104528


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
dimon222 commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-944827298


   > By running the workers behind a load balancer, you're removing the webserver's ability to specify which server the logs are stored on..
   I would assume "shared logs" implies that storage is shared (mounted volume, PVC in kubernetes, etc). Unless there's something on worker itself restricts to go above what was allocated to this specific worker to do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] SamWheating commented on issue #18239: Airflow Celery Worker logs inaccessible

Posted by GitBox <gi...@apache.org>.
SamWheating commented on issue #18239:
URL: https://github.com/apache/airflow/issues/18239#issuecomment-944809282


   > worker.abc.com -> Airflow workers (multiple behind a load balancer sharing logs storage)
   
   I don't think this is going to work, since each airflow worker runs a background flask application which serves logs from tasks run on that worker at the specified logging port (in this case 8793). 
   
   https://github.com/apache/airflow/blob/9b3ed1f652fcdf6eaf672e5d15646a0512b852f4/airflow/utils/serve_logs.py#L70-L72
   
   By running the workers behind a load balancer, you're removing the webserver's ability to specify which server the logs are stored on. I suspect that if you reload the page enough times it may work eventually when your request happens to be routed to the correct worker via the load balancer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org