You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/02 00:07:41 UTC

[GitHub] [airflow] t4n1o edited a comment on issue #19192: The scheduler does not appear to be running. Last heartbeat was received X minutes ago.

t4n1o edited a comment on issue #19192:
URL: https://github.com/apache/airflow/issues/19192#issuecomment-1003639114


   Well, there is nothing locking the db. This issue occurs even if there is no database being used by my application.
   
   Types of tasks that cause this problem:
    - download all the archives from public.bitmex.com via a python script (internet speed is the bottleneck) (takes about 3 hours the first run)
    - decompress csv.gz files into csv files (disk speed is the bottleneck) (takes about 4 hours to run the first time)
    - read csv records for each day and transform them with a custom rust parsing tool
   
   Any program written in rust or python that takes a long time to execute will cause this problem. We are using airflow because once we sync all the historical data, we run the task once per day each new day.
    
    
   ![image](https://user-images.githubusercontent.com/75998700/147862770-fd91e1ba-a761-46ca-b4e8-b3dc14760ac8.png)
   
    Here is a dump of what the scheduler is doing, while it's stuck.
   
   ![image](https://user-images.githubusercontent.com/75998700/147862649-a5ddf8e6-fb3e-43bc-b3ad-721a71bad6b6.png)
   
   State of the various airflow processes:
   ![image](https://user-images.githubusercontent.com/75998700/147862782-ed08e5f0-29d4-47e8-8c0e-7c87f049c73f.png)
   
   
   I am starting the rust/python programs in a separate process with BashOperator, and it's stuck on _recv().  Since all the tasks are limited by a rate-limit of some API, disk speed, or network speed, it would be better if airflow could actually run more than 1 task at a time. Any ideas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org