You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2015/02/04 02:43:37 UTC

[jira] [Commented] (AMBARI-9334) Ambari StageDAO.findByCommandStatuses causes Postgress HIGH CPU

    [ https://issues.apache.org/jira/browse/AMBARI-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304475#comment-14304475 ] 

Jonathan Hurley commented on AMBARI-9334:
-----------------------------------------

- The above suggested query causes regressions since there is an 1 to _n_ relationship between stages and commands. If there are multiple commands for a single stage, then multiple stages are returned. There's no way to enforce a DISTINCT clause since stage includes *LOB data.

- The original query can be slightly optimized, but avoiding a join is not possible. In all cases, even when trying to use a single, unique value between the returned stages and requesting the stage entity, a join is executed by the database compiler.

- Adding a clustered index to the stage table on request_id,stage_id fixes this problem. Although it would make writes to this table slightly slower, this table is read from more often than written to. However, some databases like MySQL don't seem to honor the new clustered index; probably because there is already a compound PK on the table.

- I've also included logic that prevents the stage lookup from executing if there are no commands in progress, greatly reducing the amount that it's invoked.

{noformat:title=Updated Code}
"Hash Join  (cost=46.62..74.32 rows=135 width=16)"
"  Hash Cond: ((stage.stage_id = rolecommand.stage_id) AND (stage.request_id = rolecommand.request_id))"
"  ->  Seq Scan on stage  (cost=0.00..24.38 rows=138 width=16)"
"  ->  Hash  (cost=45.42..45.42 rows=80 width=16)"
"        ->  HashAggregate  (cost=44.62..45.42 rows=80 width=16)"
"              ->  Seq Scan on host_role_command rolecommand  (cost=0.00..43.02 rows=319 width=16)"
"                    Filter: ((status)::text = 'COMPLETED'::text)"
{noformat}


> Ambari StageDAO.findByCommandStatuses causes Postgress HIGH CPU
> ---------------------------------------------------------------
>
>                 Key: AMBARI-9334
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9334
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.0, 1.6.1, 1.7.0
>         Environment: RHEL 6.4/6.5
> postgresql-server-8.4.13-1.el6_3.x86_64
> postgresql-8.4.13-1.el6_3.x86_64
> postgresql-libs-8.4.13-1.el6_3.x86_64
> ambari-agent-1.6.1-98.x86_64
> ambari-log4j-1.6.1.98-1.noarch
> ambari-server-1.6.1-98.noarch
>            Reporter: Greg Senia
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: Screen Shot 2015-01-26 at 12.18.56 PM.png, Screen Shot 2015-01-26 at 12.19.26 PM.png, StageDAO-1.6.1.patch, StageDAO.java
>
>
> The following code that generates a query causes postgres to use lots of CPU espcially if the Ambari DB grows over time. We reduced CPU by 30-40% by fixing the code below.
> Before:
> public List<StageEntity> findByCommandStatuses(Collection<HostRoleStatus> statuses) {
> TypedQuery<StageEntity> query = entityManagerProvider.get().createQuery("SELECT stage " +
> "FROM StageEntity stage WHERE stage.stageId IN (SELECT hrce.stageId FROM " +
> "HostRoleCommandEntity hrce WHERE stage.requestId = hrce.requestId and hrce.status IN ?1 ) " +
> "ORDER BY stage.requestId, stage.stageId", StageEntity.class);
> return daoUtils.selectList(query, statuses);
> }
> After:
> @RequiresSession
> public List<StageEntity> findByCommandStatuses(Collection<HostRoleStatus> statuses) {
> TypedQuery<StageEntity> query = entityManagerProvider.get().createQuery("SELECT stage "+
> "FROM StageEntity stage, HostRoleCommandEntity hrce " +
> "WHERE stage.requestId = hrce.requestId AND stage.stageId = hrce.stageId and hrce.status IN ?1 " +
> "ORDER BY stage.requestId, stage.stageId", StageEntity.class);
> return daoUtils.selectList(query, statuses);
> }
> Before EXPLAIN ANALYZE:
> ambari=> explain analyze SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id, t0.log_info, t0.request_context, t0.request_id FROM stage t0 WHERE t0.stage_id IN (SELECT t1.stage_id FROM host_role_command t1 WHERE ((t0.request_id = t1.request_id) AND (t1.status IN ('QUEUED','IN_PROGRESS','PENDING')))) ORDER BY t0.request_id, t0.stage_id
> ;
>                                                              QUERY PLAN                                                            
> ------------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=7407488.50..7407492.69 rows=1676 width=894) (actual time=55418.086..55418.086 rows=0 loops=1)
>    Sort Key: t0.request_id, t0.stage_id
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Seq Scan on stage t0  (cost=0.00..7407398.75 rows=1676 width=894) (actual time=55418.081..55418.081 rows=0 loops=1)
>          Filter: (SubPlan 1)
>          SubPlan 1
>            ->  Seq Scan on host_role_command t1  (cost=0.00..4418.07 rows=1 width=8) (actual time=16.514..16.514 rows=0 loops=3353)
>                  Filter: (($0 = request_id) AND ((status)::text = ANY ('{QUEUED,IN_PROGRESS,PENDING}'::text[])))
> Total runtime: 55418.123 ms
> (9 rows)
> After: Explain Analyze:
> ambari=> explain analyze SELECT t0.stage_id, t0.cluster_host_info, t0.cluster_id, t0.log_info, t0.request_context, t0.request_id FROM stage t0, host_role_command t1 WHERE ((t0.request_id = t1.request_id) AND t0.stage_id =t1.stage_id and (t1.status IN ('QUEUED','IN_PROGRESS','PENDING'))) ORDER BY t0.stage_id,t0.request_id
> ;
>                                                            QUERY PLAN                                                          
> --------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=4346.51..4346.52 rows=2 width=894) (actual time=53.605..53.605 rows=0 loops=1)
>    Sort Key: t0.stage_id, t0.request_id
>    Sort Method:  quicksort  Memory: 25kB
>    ->  Nested Loop  (cost=0.00..4346.50 rows=2 width=894) (actual time=53.596..53.596 rows=0 loops=1)
>          ->  Seq Scan on host_role_command t1  (cost=0.00..4338.22 rows=1 width=16) (actual time=53.595..53.595 rows=0 loops=1)
>                Filter: ((status)::text = ANY ('{QUEUED,IN_PROGRESS,PENDING}'::text[]))
>          ->  Index Scan using stage_pkey on stage t0  (cost=0.00..8.27 rows=1 width=894) (never executed)
>                Index Cond: ((t0.stage_id = t1.stage_id) AND (t0.request_id = t1.request_id))
> Total runtime: 53.654 ms
> (9 rows)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)