You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2015/03/12 19:40:38 UTC

[jira] [Created] (AMBARI-10050) Querying For Requests By Task Status Has Poor Performance

Jonathan Hurley created AMBARI-10050:
----------------------------------------

             Summary: Querying For Requests By Task Status Has Poor Performance
                 Key: AMBARI-10050
                 URL: https://issues.apache.org/jira/browse/AMBARI-10050
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Critical
             Fix For: 2.0.0


When querying for the requests that are either IN_PROGRESS, FAILED or COMPLETED, the query being used is inefficient and can cause a wait of up to 10 minutes in a cluster where there is a large number of stages and tasks.

{{HostRoleCommandDAO.getRequestsByTaskStatus(...)}}

This SQL seems overly complex for what it is. Removing the nested SELECT seems like a great way to reduce the query time:

{code}
SELECT DISTINCT task.request_id as request_id
  FROM host_role_command task WHERE task.status IN ( 'COMPLETED', 'FAILED', 'TIMEDOUT', 'ABORTED')
  ORDER BY task.request_id ASC;
{code}

... or if we want to keep the NOT IN

{code}
SELECT DISTINCT task.request_id as task_id
  FROM host_role_command task WHERE task.status NOT IN ( 'QUEUED', 'IN_PROGRESS', 
                                              'PENDING', 'HOLDING', 
                                              'HOLDING_FAILED', 
                                              'HOLDING_TIMEOUT' )
  ORDER BY task.request_id ASC
  LIMIT 1000
{code}

But to be honest, my suggestion is to rewrite this as a simple query that matches from an {{EnumSet}} found in {{HostRoleStatus}}.

Essentially, the problem here is that {{getRequestsByStatus}} is trying to do the calculation work to determine the request status from task status all in SQL. This method should be broken out into 2 SQL queries:

- My above query for IN_PROGRESS or FAILED requests
- A new query for COMPLETED that looks for any requests where all tasks have completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)