You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/10/24 15:41:56 UTC

[GitHub] [airflow-site] gary-harpaz opened a new issue #92: Performance issues

gary-harpaz opened a new issue #92: Performance issues
URL: https://github.com/apache/airflow-site/issues/92
 
 
   Hi,
   
   At my company we are working on a poc/ mvp. We are introducing airflow to replace our custom aging cron management system.
   We heard good things about it and want to try it out.
   
   I am trying out Branch v1-10-stable (version 1.10.5) along with the provided Dockerfile. It is running on my laptop, 16 gig memory with 8 cores on ubuntu. It is configured to use CeleryExecuter and mysql backend.
   
   What I am seeing is very high cpu usage both for airflow scheduler and airflow web server. The web server is consuming also huge amounts of memory approx 10 gigs. This is when no DAGs are even running!
   
   This doesn't make sense to me considering the low strain it is on.
   
   The process we want to implement can be decomposed to a 3 node graph. Without getting into details It is fetching a specific data type for specific customer for specific date and uploads it to s3. 
   There are 17 data types, 500 customers and we need to fetch 30 days back of data each day. Each customer is on different timezone.
   
   We implement this as 500 dags (scheduled according to customer timezone). Each dag has a graph with 3×30×17=1530 nodes in its graph. We tested this also using strictly dummy operators.
   
   In the addition to the high cpu the webserver is not responding. It spins up worker processes always on 100% cpu. It is alway running "filling dagbag".
   
   Does airflow have performance issues by merely defining lots of dags with lots of operators?
   
   Any help is much appreciated.
   
   Thanks,
   
   Gary

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services