You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/09 00:39:18 UTC

[GitHub] [airflow] potiuk commented on issue #21078: DAG with multiple async tasks leads to MySQL errors... which lead to failed tasks

potiuk commented on issue #21078:
URL: https://github.com/apache/airflow/issues/21078#issuecomment-1241371006

   Umfortunately there is no clear indication from the logs and it's also hard to correlate them - if you could help with narrowing down and selecting much smaller set of queries corresponding to the actual problem, that could probably help - otherwise siimply looking at your log and trying to find out and spend time (in the free time we have where we try to help people here) is just too much of an investmen (you have to remember it is a free software and any help you get here is done in the time we can spare outside of the normal time of hours). If you could help to narrow it down even more then trying to help you wil be more efficitent.
   
   From the look I had in the time I could spare, I think the problem is that you might be hitting the limits of query size to be issued by the server - simply your server is not able to cope with the queries that Airlfow is sending to it. 
   
   What I could recommend you to look at some of the configuration parameters of scheduler and tune down some of the values there and observe if they help: https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html#scheduler-configuration-options
   
   The chapter explains in general the fine tuning options you can choose and how they impact scheduler: https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html#fine-tuning-your-scheduler-performance
   
   Unfortunately there are no "ready recipes" what to do and what to configure - because depending on your MySQL (multitude) of settings there are various limits you can start hitting and there is no clear error that shows up in the logs to indicate what it is. However decreasing down some of the "max values"  that scheduler is going to process in single loop for example might improve that a lot - and it would be great to hear if it does help when you try it. Just make sure to experiment one setting at a time.
   
   Another general advice I'd have - if you only can, switch to Postgres. We have > 80% of users using Postrges https://airflow.apache.org/blog/airflow-survey-2022/  and possbly 90% of problems (similar to yours) with stability are coming from MySQL (so you can see that stability of Postgres is likely a few orders of magnitude better than that of mySQL). I think if you are looking for a quick solution to your problems, this might be simply most pragmatic and quickest approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org