You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Jonathan Hurley <jh...@hortonworks.com> on 2015/08/06 06:25:56 UTC

Review Request 37161: Cluster creates fail on larger deployments with SQL Azure DB

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37161/
-----------------------------------------------------------

Review request for Ambari, Myroslav Papirkovskyy, Sumit Mohanty, and Sid Wagle.


Bugs: AMBARI-12657
    https://issues.apache.org/jira/browse/AMBARI-12657


Repository: ambari


Description
-------

We started doing larger cluster creates (48 workernodes) with SQL Azure DB as an Ambari DB, and we are seeing below HTTP GET requests timeout on the client side (even after retries), resulting in cluster create failures (15%). This is a tracking Jira to resolve the CRUD failures.

What I’m seeing is that DB CPU usage goes above 50% in some of my experiments for 48 node clusters. This might explain why SQL is running slow.

Basically, it’s this one query which consumes most of the CPU. Query plan is also attached.
```
SELECT DISTINCT t0.request_id FROM host_role_command t0 WHERE NOT EXISTS (SELECT @P0 FROM host_role_command t1 WHERE (t1.status IN (@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9)))  ORDER BY t0.request_id ASC
```

There's no need to do a JOIN on the same table here; we can eliminate the inner SELECT and use a `NOT IN` clause.


Diffs
-----

  ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog211Test.java 2ba44bf 

Diff: https://reviews.apache.org/r/37161/diff/


Testing
-------

mvn clean test

Tests run: 3112, Failures: 0, Errors: 0, Skipped: 23

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:02 h
[INFO] Finished at: 2015-08-05T21:21:52-04:00
[INFO] Final Memory: 29M/847M

Verified the new SQL works on all databases.


Thanks,

Jonathan Hurley


Re: Review Request 37161: Cluster creates fail on larger deployments with SQL Azure DB

Posted by Sid Wagle <sw...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37161/#review94351
-----------------------------------------------------------

Ship it!


Ship It!

- Sid Wagle


On Aug. 6, 2015, 4:27 a.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37161/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2015, 4:27 a.m.)
> 
> 
> Review request for Ambari, Myroslav Papirkovskyy, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-12657
>     https://issues.apache.org/jira/browse/AMBARI-12657
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> We started doing larger cluster creates (48 workernodes) with SQL Azure DB as an Ambari DB, and we are seeing below HTTP GET requests timeout on the client side (even after retries), resulting in cluster create failures (15%). This is a tracking Jira to resolve the CRUD failures.
> 
> What I’m seeing is that DB CPU usage goes above 50% in some of my experiments for 48 node clusters. This might explain why SQL is running slow.
> 
> Basically, it’s this one query which consumes most of the CPU. Query plan is also attached.
> ```
> SELECT DISTINCT t0.request_id FROM host_role_command t0 WHERE NOT EXISTS (SELECT @P0 FROM host_role_command t1 WHERE (t1.status IN (@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9)))  ORDER BY t0.request_id ASC
> ```
> 
> There's no need to do a JOIN on the same table here; we can eliminate the inner SELECT and use a `NOT IN` clause.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java a72e1fe 
> 
> Diff: https://reviews.apache.org/r/37161/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> Tests run: 3112, Failures: 0, Errors: 0, Skipped: 23
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 01:02 h
> [INFO] Finished at: 2015-08-05T21:21:52-04:00
> [INFO] Final Memory: 29M/847M
> 
> Verified the new SQL works on all databases.
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Re: Review Request 37161: Cluster creates fail on larger deployments with SQL Azure DB

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37161/
-----------------------------------------------------------

(Updated Aug. 7, 2015, 2:38 p.m.)


Review request for Ambari, Myroslav Papirkovskyy, Sumit Mohanty, and Sid Wagle.


Changes
-------

The original patch had an error where it returned requests with tasks in progress. A few changes were made to fix this problem:
- The original SQL was changed back so that it still does a nested SELECT
- The request for COMPLETED is only made now if the map is empty
- We are not actually doing the nested select from the executor anymore; since this map is relatively small, we are retrieving the requests which may be cached and using their calculated status.


Bugs: AMBARI-12657
    https://issues.apache.org/jira/browse/AMBARI-12657


Repository: ambari


Description
-------

We started doing larger cluster creates (48 workernodes) with SQL Azure DB as an Ambari DB, and we are seeing below HTTP GET requests timeout on the client side (even after retries), resulting in cluster create failures (15%). This is a tracking Jira to resolve the CRUD failures.

What I’m seeing is that DB CPU usage goes above 50% in some of my experiments for 48 node clusters. This might explain why SQL is running slow.

Basically, it’s this one query which consumes most of the CPU. Query plan is also attached.
```
SELECT DISTINCT t0.request_id FROM host_role_command t0 WHERE NOT EXISTS (SELECT @P0 FROM host_role_command t1 WHERE (t1.status IN (@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9)))  ORDER BY t0.request_id ASC
```

There's no need to do a JOIN on the same table here; we can eliminate the inner SELECT and use a `NOT IN` clause.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java a72e1fe 
  ambari-server/src/main/java/org/apache/ambari/server/serveraction/ServerActionExecutor.java 49c031f 
  ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java 8def61f 

Diff: https://reviews.apache.org/r/37161/diff/


Testing
-------

mvn clean test

Tests run: 3112, Failures: 0, Errors: 0, Skipped: 23

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:02 h
[INFO] Finished at: 2015-08-05T21:21:52-04:00
[INFO] Final Memory: 29M/847M

Verified the new SQL works on all databases.


Thanks,

Jonathan Hurley


Re: Review Request 37161: Cluster creates fail on larger deployments with SQL Azure DB

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37161/
-----------------------------------------------------------

(Updated Aug. 6, 2015, 12:27 a.m.)


Review request for Ambari, Myroslav Papirkovskyy, Sumit Mohanty, and Sid Wagle.


Changes
-------

Attaching correct patch.


Bugs: AMBARI-12657
    https://issues.apache.org/jira/browse/AMBARI-12657


Repository: ambari


Description
-------

We started doing larger cluster creates (48 workernodes) with SQL Azure DB as an Ambari DB, and we are seeing below HTTP GET requests timeout on the client side (even after retries), resulting in cluster create failures (15%). This is a tracking Jira to resolve the CRUD failures.

What I’m seeing is that DB CPU usage goes above 50% in some of my experiments for 48 node clusters. This might explain why SQL is running slow.

Basically, it’s this one query which consumes most of the CPU. Query plan is also attached.
```
SELECT DISTINCT t0.request_id FROM host_role_command t0 WHERE NOT EXISTS (SELECT @P0 FROM host_role_command t1 WHERE (t1.status IN (@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9)))  ORDER BY t0.request_id ASC
```

There's no need to do a JOIN on the same table here; we can eliminate the inner SELECT and use a `NOT IN` clause.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java a72e1fe 

Diff: https://reviews.apache.org/r/37161/diff/


Testing
-------

mvn clean test

Tests run: 3112, Failures: 0, Errors: 0, Skipped: 23

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:02 h
[INFO] Finished at: 2015-08-05T21:21:52-04:00
[INFO] Final Memory: 29M/847M

Verified the new SQL works on all databases.


Thanks,

Jonathan Hurley