You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/23 14:47:06 UTC

[GitHub] [airflow] norwoodj edited a comment on issue #7935: scheduler gets stuck without a trace

norwoodj edited a comment on issue #7935:
URL: https://github.com/apache/airflow/issues/7935#issuecomment-715386389


   @teastburn we tried these settings and it did not fix things. @duyet I tailed those logs and didn't see anything out of the ordinary, the logs just... stop. It's happening now, so any debugging info you'd like me to take I can do:
   ![Screen Shot 2020-10-23 at 14 40 03](https://user-images.githubusercontent.com/2896045/97017682-c0658880-153d-11eb-92de-47b174588dbb.png).
    
   We use airflow here at Cloudflare to run a couple hundred jobs a day, and this has become a major issue for us. It is very difficult for us to downgrade to a version older than 1.10 and ever since we upgraded this has been a persistent and very annoying issue. Every 3-6 hours, every day for the past 4 months, the scheduler just stops running. The only "solution" we've found is to run a cronjob that kills the scheduler pod every 6 hours. And that leaves a ton of dangling tasks around, it is not a permanent or even really a workable solution.
   
   I'm happy to debug as much as possible, I've tried digging into the code myself as well, but I simply don't have the familiarity to figure out what's going wrong without a significant time investment. Any help y'all can give us would be massively appreciated. At this point we're considering dropping airflow. We simply can't continue working with such a flaky platform.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org