You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/06 06:29:23 UTC

[GitHub] [airflow] mik-laj opened a new pull request #8729: Optimize count queries on /home

mik-laj opened a new pull request #8729:
URL: https://github.com/apache/airflow/pull/8729


   I replaced 3 database queries with 1 query that [works better on MySQL](https://stackoverflow.com/questions/14754994/why-is-sqlalchemy-count-much-slower-than-the-raw-query).
   
   ```diff
   +SELECT dag.is_paused AS dag_is_paused, count(dag.dag_id) AS count_1  FROM dag  WHERE NOT dag.is_subdag AND dag.is_active GROUP BY dag.is_paused | {}
    SELECT anon_1.dag_dag_id AS anon_1_dag_dag_id, anon_1.dag_root_dag_id AS anon_1_dag_root_dag_id, anon_1.dag_is_paused AS anon_1_dag_is_paused, anon_1.dag_is_subdag AS anon_1_dag_is_subdag, anon_1.dag_is_active AS anon_1_dag_is_active, anon_1.dag_last_scheduler_run AS anon_1_dag_last_scheduler_run, anon_1.dag_last_pickled AS anon_1_dag_last_pickled, anon_1.dag_last_expired AS anon_1_dag_last_expired, anon_1.dag_scheduler_lock AS anon_1_dag_scheduler_lock, anon_1.dag_pickle_id AS anon_1_dag_pickle_id, anon_1.dag_fileloc AS anon_1_dag_fileloc, anon_1.dag_owners AS anon_1_dag_owners, anon_1.dag_description AS anon_1_dag_description, anon_1.dag_default_view AS anon_1_dag_default_view, anon_1.dag_schedule_interval AS anon_1_dag_schedule_interval, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id  FROM (SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval  FROM dag  WHERE NOT dag.is_subdag AND dag.is_active ORDER BY dag.dag_id   LIMIT %(param_1)s OFFSET %(param_2)s) AS anon_1 LEFT OUTER JOIN dag_tag AS dag_tag_1 ON anon_1.dag_dag_id = dag_tag_1.dag_id ORDER BY anon_1.dag_dag_id | {'param_1': 100, 'param_2': 0}
    SELECT DISTINCT ON (dag_tag.name) dag_tag.name AS dag_tag_name  FROM dag_tag | {}
    SELECT import_error.id AS import_error_id, import_error.timestamp AS import_error_timestamp, import_error.filename AS import_error_filename, import_error.stacktrace AS import_error_stacktrace  FROM import_error | {}
   -SELECT count(*) AS count_1  FROM (SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval  FROM dag  WHERE NOT dag.is_subdag AND dag.is_active) AS anon_1 | {}
   -SELECT count(*) AS count_1  FROM (SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval  FROM dag  WHERE NOT dag.is_subdag AND dag.is_active AND NOT dag.is_paused) AS anon_1 | {}
   -SELECT count(*) AS count_1  FROM (SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval  FROM dag  WHERE NOT dag.is_subdag AND dag.is_active AND dag.is_paused) AS anon_1 | {}
    SELECT job.id AS job_id, job.dag_id AS job_dag_id, job.state AS job_state, job.job_type AS job_job_type, job.start_date AS job_start_date, job.end_date AS job_end_date, job.latest_heartbeat AS job_latest_heartbeat, job.executor_class AS job_executor_class, job.hostname AS job_hostname, job.unixname AS job_unixname  FROM job  WHERE job.job_type IN (%(job_type_1)s) ORDER BY job.latest_heartbeat DESC   LIMIT %(param_1)s | {'job_type_1': 'SchedulerJob', 'param_1': 1}
   ```
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Target Github ISSUE in description if exists
   - [X] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [X] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org