You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@cloudstack.apache.org by "Sowmya Krishnan (JIRA)" <ji...@apache.org> on 2015/05/20 06:22:00 UTC

[jira] [Updated] (CLOUDSTACK-8485) listAPIs are taking too long to return results

     [ https://issues.apache.org/jira/browse/CLOUDSTACK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sowmya Krishnan updated CLOUDSTACK-8485:
----------------------------------------
    Description: 
listAPIs are taking significantly longer than before (4.2.x)

I tried out few listAPI calls using a simulator set up with ~ 10K VMs and 8K Routers. Here are few results:
listVirtualMachines is taking > 25 sec to return with pagesize set to 50. This is in comparison to 2 sec in earlier cases such as 4.2.

listVolumes with pagesize = 1000 took more than 10 mins and finally times out.

Further observations show that there are also lot of slow queries being logged in catalina.out and in MySQL slow query logs. I am not sure if this could be the reason for DB performance getting impacted in turn causing an impact on listAPIs too.

Here's a sample of slow queries from catalina.out:

Mon May 11 07:31:15 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 3305 ms, connection-id: 637759, statement-id: 3218312, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 3,305 ms):SELECT user_vm_details.id, user_vm_details.vm_id, user_vm_details.name, user_vm_details.value, user_vm_details.display FROM user_vm_details WHERE user_vm_details.vm_id = 9117Mon May 11 07:31:15 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 3305 ms, connection-id: 637843, statement-id: 3218311, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 3,305 ms):SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 345  AND host.removed IS NULL
 Mon May 11 07:31:17 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 6458 ms, connection-id: 637623, statement-id: 3218243, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 6,458 ms):SELECT storage_pool_host_ref.host_id FROM storage_pool_host_ref  INNER JOIN host ON storage_pool_host_ref.host_id=host.id WHERE storage_pool_host_ref.pool_id = 197  AND  (host.status = 'Up'  AND host.resource_state = 'Enabled' )Mon May 11 07:31:17 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 2402 ms, connection-id: 637754, statement-id: 3218371, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 2,402 ms):SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 669  AND host.removed IS NULL

The following is from MySQL slow query log:

SELECT vm_instance.id, vm_instance.name, vm_instance.vnc_password, vm_instance.proxy_id, vm_instance.proxy_assign_time, vm_instance.state, vm_instance.private_ip_address, vm_instance.instance_name, vm_instance.vm_template_id, vm_instance.guest_os_id, vm_instance.host_id, vm_instance.last_host_id, vm_instance.pod_id, vm_instance.private_mac_address, vm_instance.data_center_id, vm_instance.vm_type, vm_instance.ha_enabled, vm_instance.display_vm, vm_instance.limit_cpu_use, vm_instance.update_count, vm_instance.created, vm_instance.removed, vm_instance.update_time, vm_instance.domain_id, vm_instance.account_id, vm_instance.service_offering_id, vm_instance.reservation_id, vm_instance.hypervisor_type, vm_instance.dynamically_scalable, vm_instance.uuid, vm_instance.disk_offering_id, vm_instance.power_state, vm_instance.power_state_update_time, vm_instance.power_state_update_count, vm_instance.power_host FROM vm_instance WHERE vm_instance.instance_name = _binary'r-16916-VM'  AND vm_instance.removed IS NULL  ORDER BY RAND() LIMIT 1;# User@Host: cloud[cloud] @ x3 [10.81.28.128]  Id: 637881# Query_time: 8.193784  Lock_time: 0.000107 Rows_sent: 1  Rows_examined: 19935SET timestamp=1431329557;

SET timestamp=1431329601;SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 913  AND host.removed IS NULL;# User@Host: cloud[cloud] @ x3 [10.81.28.128]  Id: 637865# Query_time: 5.861241  Lock_time: 0.000139 Rows_sent: 1  Rows_examined: 1




  was:
listAPIs are taking significantly longer than before (4.2.x)

I tried out few listAPI calls using a simulator set up with ~ 10K VMs and 8K Routers. Here are few results:
listVirtualMachines is taking > 25 sec to return with pagesize set to 25. This is in comparison to 2 sec in earlier cases such as 4.2.

listVolumes with pagesize = 1000 took more than 10 mins and finally times out.

Further observations show that there are also lot of slow queries being logged in catalina.out and in MySQL slow query logs. I am not sure if this could be the reason for DB performance getting impacted in turn causing an impact on listAPIs too.

Here's a sample of slow queries from catalina.out:

Mon May 11 07:31:15 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 3305 ms, connection-id: 637759, statement-id: 3218312, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 3,305 ms):SELECT user_vm_details.id, user_vm_details.vm_id, user_vm_details.name, user_vm_details.value, user_vm_details.display FROM user_vm_details WHERE user_vm_details.vm_id = 9117Mon May 11 07:31:15 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 3305 ms, connection-id: 637843, statement-id: 3218311, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 3,305 ms):SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 345  AND host.removed IS NULL
 Mon May 11 07:31:17 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 6458 ms, connection-id: 637623, statement-id: 3218243, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 6,458 ms):SELECT storage_pool_host_ref.host_id FROM storage_pool_host_ref  INNER JOIN host ON storage_pool_host_ref.host_id=host.id WHERE storage_pool_host_ref.pool_id = 197  AND  (host.status = 'Up'  AND host.resource_state = 'Enabled' )Mon May 11 07:31:17 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 2402 ms, connection-id: 637754, statement-id: 3218371, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 2,402 ms):SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 669  AND host.removed IS NULL

The following is from MySQL slow query log:

SELECT vm_instance.id, vm_instance.name, vm_instance.vnc_password, vm_instance.proxy_id, vm_instance.proxy_assign_time, vm_instance.state, vm_instance.private_ip_address, vm_instance.instance_name, vm_instance.vm_template_id, vm_instance.guest_os_id, vm_instance.host_id, vm_instance.last_host_id, vm_instance.pod_id, vm_instance.private_mac_address, vm_instance.data_center_id, vm_instance.vm_type, vm_instance.ha_enabled, vm_instance.display_vm, vm_instance.limit_cpu_use, vm_instance.update_count, vm_instance.created, vm_instance.removed, vm_instance.update_time, vm_instance.domain_id, vm_instance.account_id, vm_instance.service_offering_id, vm_instance.reservation_id, vm_instance.hypervisor_type, vm_instance.dynamically_scalable, vm_instance.uuid, vm_instance.disk_offering_id, vm_instance.power_state, vm_instance.power_state_update_time, vm_instance.power_state_update_count, vm_instance.power_host FROM vm_instance WHERE vm_instance.instance_name = _binary'r-16916-VM'  AND vm_instance.removed IS NULL  ORDER BY RAND() LIMIT 1;# User@Host: cloud[cloud] @ x3 [10.81.28.128]  Id: 637881# Query_time: 8.193784  Lock_time: 0.000107 Rows_sent: 1  Rows_examined: 19935SET timestamp=1431329557;

SET timestamp=1431329601;SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 913  AND host.removed IS NULL;# User@Host: cloud[cloud] @ x3 [10.81.28.128]  Id: 637865# Query_time: 5.861241  Lock_time: 0.000139 Rows_sent: 1  Rows_examined: 1





> listAPIs are taking too long to return results
> ----------------------------------------------
>
>                 Key: CLOUDSTACK-8485
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8485
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>          Components: Management Server
>    Affects Versions: 4.5.1
>            Reporter: Sowmya Krishnan
>
> listAPIs are taking significantly longer than before (4.2.x)
> I tried out few listAPI calls using a simulator set up with ~ 10K VMs and 8K Routers. Here are few results:
> listVirtualMachines is taking > 25 sec to return with pagesize set to 50. This is in comparison to 2 sec in earlier cases such as 4.2.
> listVolumes with pagesize = 1000 took more than 10 mins and finally times out.
> Further observations show that there are also lot of slow queries being logged in catalina.out and in MySQL slow query logs. I am not sure if this could be the reason for DB performance getting impacted in turn causing an impact on listAPIs too.
> Here's a sample of slow queries from catalina.out:
> Mon May 11 07:31:15 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 3305 ms, connection-id: 637759, statement-id: 3218312, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 3,305 ms):SELECT user_vm_details.id, user_vm_details.vm_id, user_vm_details.name, user_vm_details.value, user_vm_details.display FROM user_vm_details WHERE user_vm_details.vm_id = 9117Mon May 11 07:31:15 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 3305 ms, connection-id: 637843, statement-id: 3218311, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 3,305 ms):SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 345  AND host.removed IS NULL
>  Mon May 11 07:31:17 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 6458 ms, connection-id: 637623, statement-id: 3218243, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 6,458 ms):SELECT storage_pool_host_ref.host_id FROM storage_pool_host_ref  INNER JOIN host ON storage_pool_host_ref.host_id=host.id WHERE storage_pool_host_ref.pool_id = 197  AND  (host.status = 'Up'  AND host.resource_state = 'Enabled' )Mon May 11 07:31:17 UTC 2015 INFO: Profiler Event: [SLOW QUERY]         at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96) duration: 2402 ms, connection-id: 637754, statement-id: 3218371, resultset-id: 0, message: Slow query (exceeded 2,000 ms, duration: 2,402 ms):SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 669  AND host.removed IS NULL
> The following is from MySQL slow query log:
> SELECT vm_instance.id, vm_instance.name, vm_instance.vnc_password, vm_instance.proxy_id, vm_instance.proxy_assign_time, vm_instance.state, vm_instance.private_ip_address, vm_instance.instance_name, vm_instance.vm_template_id, vm_instance.guest_os_id, vm_instance.host_id, vm_instance.last_host_id, vm_instance.pod_id, vm_instance.private_mac_address, vm_instance.data_center_id, vm_instance.vm_type, vm_instance.ha_enabled, vm_instance.display_vm, vm_instance.limit_cpu_use, vm_instance.update_count, vm_instance.created, vm_instance.removed, vm_instance.update_time, vm_instance.domain_id, vm_instance.account_id, vm_instance.service_offering_id, vm_instance.reservation_id, vm_instance.hypervisor_type, vm_instance.dynamically_scalable, vm_instance.uuid, vm_instance.disk_offering_id, vm_instance.power_state, vm_instance.power_state_update_time, vm_instance.power_state_update_count, vm_instance.power_host FROM vm_instance WHERE vm_instance.instance_name = _binary'r-16916-VM'  AND vm_instance.removed IS NULL  ORDER BY RAND() LIMIT 1;# User@Host: cloud[cloud] @ x3 [10.81.28.128]  Id: 637881# Query_time: 8.193784  Lock_time: 0.000107 Rows_sent: 1  Rows_examined: 19935SET timestamp=1431329557;
> SET timestamp=1431329601;SELECT host.id, host.disconnected, host.name, host.status, host.type, host.private_ip_address, host.private_mac_address, host.private_netmask, host.public_netmask, host.public_ip_address, host.public_mac_address, host.storage_ip_address, host.cluster_id, host.storage_netmask, host.storage_mac_address, host.storage_ip_address_2, host.storage_netmask_2, host.storage_mac_address_2, host.hypervisor_type, host.proxy_port, host.resource, host.fs_type, host.available, host.setup, host.resource_state, host.hypervisor_version, host.update_count, host.uuid, host.data_center_id, host.pod_id, host.cpu_sockets, host.cpus, host.url, host.speed, host.ram, host.parent, host.guid, host.capabilities, host.total_size, host.last_ping, host.mgmt_server_id, host.dom0_memory, host.version, host.created, host.removed FROM host WHERE host.id = 913  AND host.removed IS NULL;# User@Host: cloud[cloud] @ x3 [10.81.28.128]  Id: 637865# Query_time: 5.861241  Lock_time: 0.000139 Rows_sent: 1  Rows_examined: 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)