You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Douglas Land <ds...@looprock.com> on 2016/07/05 16:15:17 UTC

Issue with 'stuck' virtual routers

We pulled a host from the pool for upgrades, and in the process seems to
have gotten a virtual router in an odd state. It's showing as destroyed in
the UI, but cloudmonkey says it's still expunging.

This host has been completely rebuild including completely redisked. On the
management node I found:

mysql> select * from op_ha_work
    -> ;
+----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
| id | instance_id | type      | vm_type      | state     | mgmt_server_id
| host_id | created             | tried | taken | step      | time_to_try |
updated |
+----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
|  1 |          13 | Migration | DomainRouter | Expunging |           NULL
|      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |  1433332034 |
    205 |
|  4 |          78 | Migration | DomainRouter | Destroyed |           NULL
|      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |  1433332092 |
     68 |
+----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+

I removed those  entries, but when the hosts persist. Via cloudmonkey it
shows expunging:
{
  "count": 1,
  "router": [
    {
      "account": "engineering",
      "created": "2014-09-05T03:56:07+0200",
      "dns1": "172.16.8.46",
      "dns2": "172.16.8.47",
      "domain": "engineering",
      "domainid": "1da498ba-5646-4cc3-a704-a20ebe12f518",
      "id": "dc48a402-41d8-4e93-b441-4b34eb83a4c8",
      "isredundantrouter": true,
      "name": "r-78-VM",
      "nic": [],
      "podid": "f53afa8d-51ff-484d-9a88-52e979aeb688",
      "redundantstate": "UNKNOWN",
      "requiresupgrade": false,
      "role": "VIRTUAL_ROUTER",
      "serviceofferingid": "ed6b13d0-3e74-4aa5-a6b7-a5d2ac6c4a6c",
      "serviceofferingname": "System Offering For Software Router",
      "state": "Expunging",
      "templateid": "bb3f7e4e-d7f6-4a72-a752-12c3221e43e9",
      "version": "4.4.1",
      "zoneid": "3467ff63-b582-4ace-9fda-8d5851bd8753",
      "zonename": "Oakland"
    }
  ]
}

If I try to destroy the host from the api I get:

Async job cf08d7fa-1609-4d0e-b33c-63cc38f7e897 failed
Error 530, Unable to locate datastore with id 18
{
  "accountid": "e3389462-6020-425a-9b9e-57141d58e1ab",
  "cmd": "org.apache.cloudstack.api.command.admin.router.DestroyRouterCmd",
  "created": "2016-07-05T17:23:53+0200",
  "jobid": "cf08d7fa-1609-4d0e-b33c-63cc38f7e897",
  "jobprocstatus": 0,
  "jobresult": {
    "errorcode": 530,
    "errortext": "Unable to locate datastore with id 18"
  },
  "jobresultcode": 530,
  "jobresulttype": "object",
  "jobstatus": 2,
  "userid": "xxx"
}

I'm guessing I need to remove all references for the routers from the
database. Does anyone know what table(s) that's stored in?

Re: Issue with 'stuck' virtual routers

Posted by Douglas Land <ds...@looprock.com>.

I was able to resolve this issue today with a good deal of help from Simon
Weller (thanks!)

We're running local storage and for some reason, though we put the host in
maintenance mode and shut down the virtual routers (I didn't do this
operation so I'm not 100% certain how we went about it), the routers both
stuck in the expunging state. We completely redisked the server so that
storage pool didn't exist any longer and the virtual host certainly didn't
exist in any capacity though they were still showing up in the API calls.

I tried a restartNetwork cleanup=false operation which reported succeeded
for on network and failed for the second, but no virtual routers were
created as a result.  Eventually I deleted that migration jobs from op_ha_work,
and the instances themselves from vm_instances. It felt a bit drastic but
marking them as Destroyed didn't seem to help and I was unable to issue a
destroy command via the API. So far that appears to have take care of the
situation. Once I manually deleted the entries I was able to execute
restartNetwork
cleanup=false successfully and the virtual routers were recreated.

Frankly I'm a little nervous there might be other references to them in the
database that might haunt us later, and when we are able to have a
maintenance I'm planning to do a restartNetwork cleanup=true to make sure
that works.

On Tue, Jul 5, 2016 at 5:51 PM, ilya <il...@gmail.com> wrote:

> Hi Doug
>
> Do you have primary storage id 18 available?
>
> # cloudmonkey list storagepools id=18
>
> I can only assume cloudstack tries to clean up after it self and fails -
> because storage pool 18 is not available.
>
> Are your running local storage zone or clustered?
>
> Lastly, your logs would indicate the issue more clearly - as to why its
> not able to expunge.
>
> Regards
> ilya
>
> On 7/5/16 9:15 AM, Douglas Land wrote:
> > We pulled a host from the pool for upgrades, and in the process seems to
> > have gotten a virtual router in an odd state. It's showing as destroyed
> in
> > the UI, but cloudmonkey says it's still expunging.
> >
> > This host has been completely rebuild including completely redisked. On
> the
> > management node I found:
> >
> > mysql> select * from op_ha_work
> >     -> ;
> >
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> > | id | instance_id | type      | vm_type      | state     |
> mgmt_server_id
> > | host_id | created             | tried | taken | step      |
> time_to_try |
> > updated |
> >
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> > |  1 |          13 | Migration | DomainRouter | Expunging |
>  NULL
> > |      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |
> 1433332034 |
> >     205 |
> > |  4 |          78 | Migration | DomainRouter | Destroyed |
>  NULL
> > |      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |
> 1433332092 |
> >      68 |
> >
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> >
> > I removed those  entries, but when the hosts persist. Via cloudmonkey it
> > shows expunging:
> > {
> >   "count": 1,
> >   "router": [
> >     {
> >       "account": "engineering",
> >       "created": "2014-09-05T03:56:07+0200",
> >       "dns1": "172.16.8.46",
> >       "dns2": "172.16.8.47",
> >       "domain": "engineering",
> >       "domainid": "1da498ba-5646-4cc3-a704-a20ebe12f518",
> >       "id": "dc48a402-41d8-4e93-b441-4b34eb83a4c8",
> >       "isredundantrouter": true,
> >       "name": "r-78-VM",
> >       "nic": [],
> >       "podid": "f53afa8d-51ff-484d-9a88-52e979aeb688",
> >       "redundantstate": "UNKNOWN",
> >       "requiresupgrade": false,
> >       "role": "VIRTUAL_ROUTER",
> >       "serviceofferingid": "ed6b13d0-3e74-4aa5-a6b7-a5d2ac6c4a6c",
> >       "serviceofferingname": "System Offering For Software Router",
> >       "state": "Expunging",
> >       "templateid": "bb3f7e4e-d7f6-4a72-a752-12c3221e43e9",
> >       "version": "4.4.1",
> >       "zoneid": "3467ff63-b582-4ace-9fda-8d5851bd8753",
> >       "zonename": "Oakland"
> >     }
> >   ]
> > }
> >
> > If I try to destroy the host from the api I get:
> >
> > Async job cf08d7fa-1609-4d0e-b33c-63cc38f7e897 failed
> > Error 530, Unable to locate datastore with id 18
> > {
> >   "accountid": "e3389462-6020-425a-9b9e-57141d58e1ab",
> >   "cmd":
> "org.apache.cloudstack.api.command.admin.router.DestroyRouterCmd",
> >   "created": "2016-07-05T17:23:53+0200",
> >   "jobid": "cf08d7fa-1609-4d0e-b33c-63cc38f7e897",
> >   "jobprocstatus": 0,
> >   "jobresult": {
> >     "errorcode": 530,
> >     "errortext": "Unable to locate datastore with id 18"
> >   },
> >   "jobresultcode": 530,
> >   "jobresulttype": "object",
> >   "jobstatus": 2,
> >   "userid": "xxx"
> > }
> >
> > I'm guessing I need to remove all references for the routers from the
> > database. Does anyone know what table(s) that's stored in?
> >
>

Re: Issue with 'stuck' virtual routers

Posted by ilya <il...@gmail.com>.

Hi Doug

Do you have primary storage id 18 available?

# cloudmonkey list storagepools id=18

I can only assume cloudstack tries to clean up after it self and fails -
because storage pool 18 is not available.

Are your running local storage zone or clustered?

Lastly, your logs would indicate the issue more clearly - as to why its
not able to expunge.

Regards
ilya

On 7/5/16 9:15 AM, Douglas Land wrote:
> We pulled a host from the pool for upgrades, and in the process seems to
> have gotten a virtual router in an odd state. It's showing as destroyed in
> the UI, but cloudmonkey says it's still expunging.
> 
> This host has been completely rebuild including completely redisked. On the
> management node I found:
> 
> mysql> select * from op_ha_work
>     -> ;
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> | id | instance_id | type      | vm_type      | state     | mgmt_server_id
> | host_id | created             | tried | taken | step      | time_to_try |
> updated |
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> |  1 |          13 | Migration | DomainRouter | Expunging |           NULL
> |      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |  1433332034 |
>     205 |
> |  4 |          78 | Migration | DomainRouter | Destroyed |           NULL
> |      24 | 2016-07-01 14:34:17 |     0 | NULL  | Migrating |  1433332092 |
>      68 |
> +----+-------------+-----------+--------------+-----------+----------------+---------+---------------------+-------+-------+-----------+-------------+---------+
> 
> I removed those  entries, but when the hosts persist. Via cloudmonkey it
> shows expunging:
> {
>   "count": 1,
>   "router": [
>     {
>       "account": "engineering",
>       "created": "2014-09-05T03:56:07+0200",
>       "dns1": "172.16.8.46",
>       "dns2": "172.16.8.47",
>       "domain": "engineering",
>       "domainid": "1da498ba-5646-4cc3-a704-a20ebe12f518",
>       "id": "dc48a402-41d8-4e93-b441-4b34eb83a4c8",
>       "isredundantrouter": true,
>       "name": "r-78-VM",
>       "nic": [],
>       "podid": "f53afa8d-51ff-484d-9a88-52e979aeb688",
>       "redundantstate": "UNKNOWN",
>       "requiresupgrade": false,
>       "role": "VIRTUAL_ROUTER",
>       "serviceofferingid": "ed6b13d0-3e74-4aa5-a6b7-a5d2ac6c4a6c",
>       "serviceofferingname": "System Offering For Software Router",
>       "state": "Expunging",
>       "templateid": "bb3f7e4e-d7f6-4a72-a752-12c3221e43e9",
>       "version": "4.4.1",
>       "zoneid": "3467ff63-b582-4ace-9fda-8d5851bd8753",
>       "zonename": "Oakland"
>     }
>   ]
> }
> 
> If I try to destroy the host from the api I get:
> 
> Async job cf08d7fa-1609-4d0e-b33c-63cc38f7e897 failed
> Error 530, Unable to locate datastore with id 18
> {
>   "accountid": "e3389462-6020-425a-9b9e-57141d58e1ab",
>   "cmd": "org.apache.cloudstack.api.command.admin.router.DestroyRouterCmd",
>   "created": "2016-07-05T17:23:53+0200",
>   "jobid": "cf08d7fa-1609-4d0e-b33c-63cc38f7e897",
>   "jobprocstatus": 0,
>   "jobresult": {
>     "errorcode": 530,
>     "errortext": "Unable to locate datastore with id 18"
>   },
>   "jobresultcode": 530,
>   "jobresulttype": "object",
>   "jobstatus": 2,
>   "userid": "xxx"
> }
> 
> I'm guessing I need to remove all references for the routers from the
> database. Does anyone know what table(s) that's stored in?
>