You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Ivan Kudryavtsev <ku...@bw-sw.com> on 2018/01/06 06:28:13 UTC

Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Hello, colleagues.

During last days I found 2 bugs which I believe is critical for 4.11
release. I would like to share them here and get help if possible:

1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it
to CS via UI and destroyed CPVM to force use it. It uses it like a charm,
but after some amount of time it loses it and console proxy connection is
no longer not possible. After it's beging rebooted or recreated everything
works well. I'm not familiar with CPVM at all and can not even imaging what
can be wrong here.

1a. CPVM has debug enabled and logs include tons of messages like:

2018-01-06 06:13:57,069 DEBUG
[cloud.consoleproxy.ConsoleProxyAjaxImageHandler] (Thread-4159:null)
AjaxImageHandler
/ajaximg?token=RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null)
decode token. host: 10.252.2.10
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null)
decode token. port: 5903
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null)
decode token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f

Every opened session is dumped to logs. I suppose it's dangerous and could
lead to FS overusage and CPVM failure.

/dev/vda10                                              368M   63M  287M
19% /var/log

Might it be that (1) is a consequence of (1a)?

2. High CPU utilization bug. After management server is launched it uses 0
CPU because I run development cloud. After two days I see that 2 cores are
used 50% by management server processes, several days ago I saw all
management server processes utilized almost all CPU available. Surprisingly
It continues function (API, UI), no active API utlization in logs.

The only two suspicios things I found for the last incident are:

root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR
management-server.log.2018-01-04.gz
2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl]
(localhost-startStop-1:null) (logid:) Unable to ping management server at
10.252.2.2:9090 due to ConnectException
2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil]
(localhost-startStop-1:null) (logid:) Unable to find properties file:
commands.properties
2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil]
(localhost-startStop-1:null) (logid:) Unable to find properties file:
commands.properties
2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051)
(logid:be4b64e0) Invocation exception, caused by:
com.cloud.exception.InsufficientServerCapacityException: Unable to create a
deployment for VM[SecondaryStorageVm|s-24-VM]Scope=interface
com.cloud.dc.DataCenter; id=1
2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8)
(logid:65010252) Invocation exception, caused by:
com.cloud.exception.InsufficientServerCapacityException: Unable to create a
deployment for VM[ConsoleProxy|v-10-VM]Scope=interface
com.cloud.dc.DataCenter; id=1
2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0)
Unable to complete AsyncJobVO {id:38188, userId: 1, accountId: 1,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Thu Jan 04 14:43:22 KRAT 2018}, job
origin:38158
2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252)
Unable to complete AsyncJobVO {id:38185, userId: 1, accountId: 1,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Thu Jan 04 14:43:21 KRAT 2018}, job
origin:38155
2018-01-04 14:43:25,127 ERROR
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send
DeleteCommand, check if host or ssvm is down?
2018-01-04 14:43:25,127 ERROR
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send
DeleteCommand, check if host or ssvm is down?


The worst thing is that I don't even have an idea how to catch it. Also, as
I have second management down, I see a lot of events like:
2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl]
(Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server
heartbeat takes too long to finish. profiler: Done. Duration: 1935ms,
profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done.
Duration: 1317ms

Could this be a reason of high CPU utilization?

-- 
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>

Re: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Posted by Ivan Kudryavtsev <ku...@bw-sw.com>.
Additional info. In CPVM logs (after SSH to it) I found:

2018-01-06 06:34:03,631 ERROR [consoleproxy.vnc.VncServerPacketReceiver]
(Thread-13:null) Unexpected exception:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.net.SocketInputStream.read(SocketInputStream.java:224)
        at
java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
        at
com.cloud.consoleproxy.vnc.VncServerPacketReceiver.run(VncServerPacketReceiver.java:61)
        at
com.cloud.consoleproxy.vnc.VncClient.doConnect(VncClient.java:190)
        at
com.cloud.consoleproxy.vnc.VncClient.connectTo(VncClient.java:156)
        at
com.cloud.consoleproxy.ConsoleProxyVncClient$1.run(ConsoleProxyVncClient.java:99)
        at java.lang.Thread.run(Thread.java:748)
2018-01-06 06:34:03,645 INFO  [consoleproxy.vnc.VncServerPacketReceiver]
(Thread-13:null) Receiving thread exit processing, shutdown connection
2018-01-06 06:34:03,645 INFO  [cloud.consoleproxy.ConsoleProxyVncClient]
(Thread-13:null) Receiver thread stopped.
2018-01-06 06:34:03,645 INFO  [cloud.consoleproxy.ConsoleProxyVncClient]
(Thread-13:null) Received client close indication. remove viewer from map.
2018-01-06 06:34:03,672 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
Received response: Seq 49-263:  { Ans: , MgmtId: 2485138019287, via: 49,
Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2018-01-06 06:34:04,453 INFO  [consoleproxy.vnc.VncClientPacketSender]
(Thread-14:null) Sending thread exit processing, shutdown connection

It seems, that after this exception it stops function.



2018-01-06 13:28 GMT+07:00 Ivan Kudryavtsev <ku...@bw-sw.com>:

> Hello, colleagues.
>
> During last days I found 2 bugs which I believe is critical for 4.11
> release. I would like to share them here and get help if possible:
>
> 1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it
> to CS via UI and destroyed CPVM to force use it. It uses it like a charm,
> but after some amount of time it loses it and console proxy connection is
> no longer not possible. After it's beging rebooted or recreated everything
> works well. I'm not familiar with CPVM at all and can not even imaging what
> can be wrong here.
>
> 1a. CPVM has debug enabled and logs include tons of messages like:
>
> 2018-01-06 06:13:57,069 DEBUG [cloud.consoleproxy.ConsoleProxyAjaxImageHandler]
> (Thread-4159:null) AjaxImageHandler /ajaximg?token=
> RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_
> 5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_
> U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_
> CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_
> 0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015
> 2018-01-06 06:13:57,070 DEBUG [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper]
> (Thread-4159:null) decode token. host: 10.252.2.10
> 2018-01-06 06:13:57,070 DEBUG [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper]
> (Thread-4159:null) decode token. port: 5903
> 2018-01-06 06:13:57,070 DEBUG [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper]
> (Thread-4159:null) decode token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f
>
> Every opened session is dumped to logs. I suppose it's dangerous and could
> lead to FS overusage and CPVM failure.
>
> /dev/vda10                                              368M   63M  287M
> 19% /var/log
>
> Might it be that (1) is a consequence of (1a)?
>
> 2. High CPU utilization bug. After management server is launched it uses 0
> CPU because I run development cloud. After two days I see that 2 cores are
> used 50% by management server processes, several days ago I saw all
> management server processes utilized almost all CPU available. Surprisingly
> It continues function (API, UI), no active API utlization in logs.
>
> The only two suspicios things I found for the last incident are:
>
> root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR
> management-server.log.2018-01-04.gz
> 2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl]
> (localhost-startStop-1:null) (logid:) Unable to ping management server at
> 10.252.2.2:9090 due to ConnectException
> 2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil]
> (localhost-startStop-1:null) (logid:) Unable to find properties file:
> commands.properties
> 2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil]
> (localhost-startStop-1:null) (logid:) Unable to find properties file:
> commands.properties
> 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
> (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051)
> (logid:be4b64e0) Invocation exception, caused by: com.cloud.exception.
> InsufficientServerCapacityException: Unable to create a deployment for
> VM[SecondaryStorageVm|s-24-VM]Scope=interface com.cloud.dc.DataCenter;
> id=1
> 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
> (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8)
> (logid:65010252) Invocation exception, caused by: com.cloud.exception.
> InsufficientServerCapacityException: Unable to create a deployment for
> VM[ConsoleProxy|v-10-VM]Scope=interface com.cloud.dc.DataCenter; id=1
> 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
> (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0)
> Unable to complete AsyncJobVO {id:38188, userId: 1, accountId: 1,
> instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
> cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oA
> BGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50
> UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5n
> L0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYX
> ZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-
> AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS
> 91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-
> ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-
> AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5l
> TWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status:
> IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid:
> 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null,
> created: Thu Jan 04 14:43:22 KRAT 2018}, job origin:38158
> 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
> (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252)
> Unable to complete AsyncJobVO {id:38185, userId: 1, accountId: 1,
> instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
> cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oA
> BGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50
> UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5n
> L0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYX
> ZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-
> AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS
> 91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-
> ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-
> AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5l
> TWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status:
> IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid:
> 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null,
> created: Thu Jan 04 14:43:21 KRAT 2018}, job origin:38155
> 2018-01-04 14:43:25,127 ERROR [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
> (consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send
> DeleteCommand, check if host or ssvm is down?
> 2018-01-04 14:43:25,127 ERROR [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
> (secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send
> DeleteCommand, check if host or ssvm is down?
>
>
> The worst thing is that I don't even have an idea how to catch it. Also,
> as I have second management down, I see a lot of events like:
> 2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl]
> (Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server
> heartbeat takes too long to finish. profiler: Done. Duration: 1935ms,
> profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done.
> Duration: 1317ms
>
> Could this be a reason of high CPU utilization?
>
> --
> With best regards, Ivan Kudryavtsev
> Bitworks Software, Ltd.
> Cell: +7-923-414-1515
> WWW: http://bitworks.software/ <http://bw-sw.com/>
>
>


-- 
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>

RE: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Posted by Sean Lair <sl...@ippathways.com>.
I use a wildcard cert on 4.9.2 it it's fine. We haven't gone to 4.10 yet to test.  We'll prob go straight to 4.11 when released.

We have also had the high-cpu on the mgmt servers in our 4.9.x deployments.  It is very frusterating, it also happens every few days.  Haven't been able to track down why yet.  In a different thread a while ago, Simon Weller also reporting high cpu issues but I'm not sure if he every found the culprit either.



-----Original Message-----
From: Ivan Kudryavtsev [mailto:kudryavtsev_ia@bw-sw.com] 
Sent: Saturday, January 6, 2018 12:28 AM
To: dev <de...@cloudstack.apache.org>
Subject: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Hello, colleagues.

During last days I found 2 bugs which I believe is critical for 4.11 release. I would like to share them here and get help if possible:

1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it to CS via UI and destroyed CPVM to force use it. It uses it like a charm, but after some amount of time it loses it and console proxy connection is no longer not possible. After it's beging rebooted or recreated everything works well. I'm not familiar with CPVM at all and can not even imaging what can be wrong here.

1a. CPVM has debug enabled and logs include tons of messages like:

2018-01-06 06:13:57,069 DEBUG
[cloud.consoleproxy.ConsoleProxyAjaxImageHandler] (Thread-4159:null) AjaxImageHandler
/ajaximg?token=RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode token. host: 10.252.2.10
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode token. port: 5903
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f

Every opened session is dumped to logs. I suppose it's dangerous and could lead to FS overusage and CPVM failure.

/dev/vda10                                              368M   63M  287M
19% /var/log

Might it be that (1) is a consequence of (1a)?

2. High CPU utilization bug. After management server is launched it uses 0 CPU because I run development cloud. After two days I see that 2 cores are used 50% by management server processes, several days ago I saw all management server processes utilized almost all CPU available. Surprisingly It continues function (API, UI), no active API utlization in logs.

The only two suspicios things I found for the last incident are:

root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR management-server.log.2018-01-04.gz
2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl]
(localhost-startStop-1:null) (logid:) Unable to ping management server at
10.252.2.2:9090 due to ConnectException
2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil]
(localhost-startStop-1:null) (logid:) Unable to find properties file:
commands.properties
2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil]
(localhost-startStop-1:null) (logid:) Unable to find properties file:
commands.properties
2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051)
(logid:be4b64e0) Invocation exception, caused by:
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[SecondaryStorageVm|s-24-VM]Scope=interface
com.cloud.dc.DataCenter; id=1
2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8)
(logid:65010252) Invocation exception, caused by:
com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[ConsoleProxy|v-10-VM]Scope=interface
com.cloud.dc.DataCenter; id=1
2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0) Unable to complete AsyncJobVO {id:38188, userId: 1, accountId: 1,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Thu Jan 04 14:43:22 KRAT 2018}, job
origin:38158
2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252) Unable to complete AsyncJobVO {id:38185, userId: 1, accountId: 1,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Thu Jan 04 14:43:21 KRAT 2018}, job
origin:38155
2018-01-04 14:43:25,127 ERROR
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send DeleteCommand, check if host or ssvm is down?
2018-01-04 14:43:25,127 ERROR
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send DeleteCommand, check if host or ssvm is down?


The worst thing is that I don't even have an idea how to catch it. Also, as I have second management down, I see a lot of events like:
2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl]
(Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server heartbeat takes too long to finish. profiler: Done. Duration: 1935ms,
profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done.
Duration: 1317ms

Could this be a reason of high CPU utilization?

--
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>