You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "lyso (via GitHub)" <gi...@apache.org> on 2023/03/04 10:40:18 UTC
[GitHub] [airflow] lyso commented on pull request #29602: Change config max tis per query

lyso commented on PR #29602:
URL: https://github.com/apache/airflow/pull/29602#issuecomment-1454694919

   > 
   
   
   
   > Thank for your contributions.
   > 
   > Could you describe a bit more of configurations of Airflow, your DB backend, how far it away out of Airflow services (latency), and your workload. I asked it because for me it is looks like by this PR we might replace [one magic number](https://github.com/apache/airflow/pull/3324#discussion_r187540176) by another.
   > 
   > I've also found that `max_tis_per_query` use in BackfillJob:
   > 
   > https://github.com/apache/airflow/blob/aa4858d85a00759a4a84bdd5fb3fe6cec196831e/airflow/jobs/backfill_job.py#L957-L959
   > 
   > Do you know how it affects this part of code?
   
   I use MWAA medium class, so DB is RDS posgreSQL, 2xSchedulers run on Fargate Container with 2vCPU and 4GB RAM. Executor is CeleryExecutor via SQS.
   
   At worst case, there are 200-300 tasks scheduled at the beginning of hours. Because of the 512 ``max_tis_per_query``, all of these tasks into single batch of the query when a scheduler enqueues tasks. It takes about 1-2 minutes to enqueue these tasks and send out to Celery broker. Based on my observation, the bottleneck is not the DB query, but is the in-memory process of [``_critical_section_enqueue_task_instances``](https://github.com/apache/airflow/blob/c53a3e153f0ab5ca933a94adc01dc6314ea8d4d1/airflow/jobs/scheduler_job.py#L972) and ``executor.heartbeat``. The scheduler need 1-2 minutes to change TI state from ``SCHEDULED`` to ``QUEUED`` and then send out to broker.
   
   During the 1-2 minutes period, the scheduler cannot make heartbeat and could be detected as inactive, then possibly causing unexpected behaviors like orphaned task adoption and task external kills. 
   
   Furthermore, in case of 2 or more schedulers (HA), setting ``max_tis_per_query`` to 512 can lead to one scheduler overloaded, but other schedulers have no tasks in their Executor (if ``core.parallelism`` is also very big).  
   
   When tuning the parameter, I lowered the ``max_tis_per_query`` to 50, then the entire cluster become much healthier. Schedulers can quickly put a batch of tasks to Celery broker. Scheduler Heartbeat rate is improved. And, more importantly, load are shared between 2 schedulers.
   
   When submitting the value in this PR, I put the magic number as 16 as default for ``max_tis_per_query``. I did not choose the value I use in my cluster, because I believe this value should be adjusted based on the hardware configuration. 16 is a value given with respect to ``core.parallelism`` (default 32). When setting ``core.parallelism=32``, we assume the hardware of scheduler and database is not very powerful. Optimally ``max_tis_per_query`` should be a divisor of ``core.parallelism``. 8 would be too small. So I chose 16.
   
   #### About ``max_tis_per_query`` in Backfill job
   
   It is used in reset orphaned task method. The ``helpers.reduce_in_chunks`` will split the entire list into smaller chunks (of size ``max_tis_per_query``, and query DB per chunk. 
   
   I believe 16 is also a setting here. Since the  ``core.parallelism``'s default is 32, we assume this is optimal for the assumed hardware. In this case, having the query batch size as 512 would be too big for the assumed hardware.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org