You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Taragolis (via GitHub)" <gi...@apache.org> on 2023/08/30 22:44:34 UTC

[GitHub] [airflow] Taragolis commented on issue #33647: Airflow Triggerer facing frequent restarts

Taragolis commented on issue #33647:
URL: https://github.com/apache/airflow/issues/33647#issuecomment-1699950397

   > Looks like an index hint should be needed or smth like that. Very interesting one. I will mark it for 2.7.1 hoping maybe someone will have time to fix it before
   
   To be honest better have a rule not to use `IN` with any potential big dataset. It really makes most RDBMS unhappy.
   
   For example in Postgres everything in `IN` become part of execution plan, and if it quite a big, then DB spend most of the time for parse, trying build multiple different plans, calculate costs over a lot of different and in the end chouse 'lets take something', and time spend for this analyze might be greater than even do FULL SEQ SCAN over couple of tables.
   
   In general better to get rid of non constant sized IN filters (couple statuses for tasks and dags) and replace by other methods:
   - [NOT] EXISTS, for SEMI-ANTI Joins over subqueries
   - JOIN over VALUES, in this case execution plans shouldn't be crazy, it should supported in PG, MySQL8 and MsSQL (RIP), maybe something similar exists for SQLite
   - Regular Joins :D
   
   @shubhransh-eb  I guess you use MySQL backend? If so, I wonder which version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org