You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Sébastien BRICE <ot...@opentelecom.fr> on 2020/11/08 03:08:40 UTC

Re: Cloudstack 4.11.3 to 4.13.1 SystemVMs Error

Hello there,

I ran into the same issue in 4.13 XS 7.2 ( brand new systemvm deny every ssh connection with a publicKey )

I manage to get it working by customyzing the VDI of the defective ssvm and consoleproxy (basically I set up a working authorized_keys from a rescueISO attached to the rootdisk of the vm)

Once SSH access recovered, I noticed the cloud service was down on the ssvm:

I had to modify /var/cache/cloud/cmdline file and change the line:

root=UUID=8e6a713a-7d9b-4a33-82b1-52aa54473466 ro console=tty0 console=ttyS0,115200n8 console=hvc0 earlyprintk=xen net.ifnames=0 biosdevname=0 debian-installer=en_US nomodeset quiet  -- quiet console=hvc0 template=domP type=secstorage host=*localhost* port=8250 name=s-32-VM zone=1 pod=1 guid=s-32-VM workers=5 resource=com.cloud.storage.resource.PremiumSecondaryStorageResource instance=SecStorage sslcopy=false role=templateProcessor mtu=1500 eth2ip=192.168.66.73 eth2mask=255.255.255.0 gateway=192.168.66.1 public.network.device=eth2 eth0ip=169.254.126.19 eth0mask=255.255.0.0 eth1ip=192.168.56.193 eth1mask=255.255.255.0 localgw=192.168.56.1 private.network.device=eth1 internaldns1=10.1.1.1 dns1=8.8.8.8 nfsVersion=null

Obviously the *localhost* setting was preventing the ssvm to talk with the management server.

As soon as I typed the IP address of my MGMT server and restarted cloud.service, the java agent is now up with the green light

From there, ISOs and templates are uploading successfully .

Yours



----- Original Message -----
From: Ammad Syed [mailto:syedammad83@gmail.com]
To: <us...@cloudstack.apache.org>
Sent: Wed, 14 Oct 2020 13:16:35 +0500
Subject: Re: Cloudstack 4.11.3 to 4.13.1 SystemVMs Error

Hi,

I have performed an upgrade on one of my other production environments and
have three XS7.0 clusters with latest hotfixes. I have performed an upgrade
from 4.11.3 to 4.13.1. The systemVM deployment in one cluster is successful
but getting failed in another cluster.

https://drive.google.com/file/d/1eBD20a1WJfe_eYCaTxM5JneMBnnebkd1/view?usp=sharing

You can review logs from above link. Below are the systemVMs that have
failed to make agent up, the XS tools was installed successfully in it.

2020-09-15 12:37:52,415 ERROR [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-2:ctx-e478e44a job-13124/job-13131 ctx-7af48048)
(logid:f96bf120) Failed to setup keystore and generate CSR for system vm:
v-753-VM
2020-09-15 12:59:27,658 ERROR [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-15:ctx-5a06ea58 job-13124/job-13158 ctx-61790402)
(logid:f96bf120) Failed to setup keystore and generate CSR for system vm:
v-758-VM
2020-09-15 13:06:14,731 ERROR [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-24:ctx-4f858410 job-13124/job-13174 ctx-d127150a)
(logid:f96bf120) Failed to setup keystore and generate CSR for system vm:
v-760-VM
2020-09-15 13:21:37,907 ERROR [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-3:ctx-4c2ba231 job-13175/job-13184 ctx-6228d3cf)
(logid:6bd89e55) Failed to setup keystore and generate CSR for system vm:
v-761-VM
2020-09-15 13:25:43,696 ERROR [c.c.v.VirtualMachineManagerImpl]
(Work-Job-Executor-5:ctx-4ecf72a0 job-13175/job-13195 ctx-2aa50a59)
(logid:6bd89e55) Failed to setup keystore and generate CSR for system vm:
v-761-VM

You can check below systemVMs that goes up in cluster and have agent
connected. Its a weird problem, I didn't see any error XS logs.

v-754-VM
s-756-VM

Ammad Ali

On Fri, Oct 9, 2020 at 12:28 PM Andrija Panic <an...@gmail.com>
wrote:

> This error is a different zone, as I can see (what I shared above).
>
> For the VM which you asked us to check logs - can you see which zone this
> is in and then:
> - disable the zone,
> - destroy the SSVM (if it is showing as startING or stopING state - make
> sure to destroy the VM on XenCenter first, then edit the vm_instances table
> to set "state"=Stopped" for that s-25457-VM) - then destroy in ACS
> - enable the zone again, it should create a brand new SSVM (check that your
> CPUs are not heavily oversubscribed, as, in theory, this could also be an
> extremely slow CPU issue)
>
> If the issue continues, please post the log again (to pastebin or
> elsewhere) - and share the new SSVM name that failed to be configured.
> -- it's worth digging in XS logs to see if you have some other issues,
> which ACS is not aware of.
>
>
> Best,
>
> On Fri, 9 Oct 2020 at 09:09, Andrija Panic <an...@gmail.com>
> wrote:
>
> > Ammad,
> >
> > what's you current status with your issue?
> >
> > In logs I can see that there is some issue with SR:
> >
> >    2020-09-10 22:58:48,374 DEBUG
> [c.c.h.x.r.w.x.CitrixStartCommandWrapper]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) 1. The VM s-25505-VM is
> in
> > Starting state.
> >    2020-09-10 22:58:48,375 DEBUG [c.c.h.x.r.CitrixResourceBase]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) no guest OS type, start
> it
> > as HVM guest
> >    2020-09-10 22:58:48,390 DEBUG [c.c.h.x.r.CitrixResourceBase]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) Created VM
> > 3fa168bd-9e29-03df-52bb-a0fe2a53d390 for s-25505-VM
> >    2020-09-10 22:58:48,394 DEBUG [c.c.h.x.r.CitrixResourceBase]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) PV args are
> >
> %template=domP%type=secstorage%host=172.16.2.42%port=8250%name=s-25505-VM%zone=10%pod=10%guid=s-25505-VM%workers=5%resource=com.cloud.storage.resource.PremiumSecondaryStorageResource%instance=SecStorage%sslcopy=false%role=templateProcessor%mtu=1500%eth2ip=175.107.206.196%eth2mask=255.255.255.0%gateway=175.107.206.1%public.network.device=eth2%eth0ip=169.254.1.184%eth0mask=255.255.0.0%eth1ip=172.16.2.56%eth1mask=255.255.255.192%mgmtcidr=
> >
> 172.16.0.0/26%localgw=172.16.2.62%private.network.device=eth1%internaldns1=202.163.96.3%internaldns2=202.163.96.4%dns1=202.163.96.3%dns2=202.163.96.4%nfsVersion=null
> >    2020-09-10 22:58:48,396 DEBUG [c.c.h.x.r.CitrixResourceBase]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) HVM args are
> template=domP
> > type=secstorage host=172.16.2.42 port=8250 name=s-25505-VM zone=10 pod=10
> > guid=s-25505-VM workers=5
> > resource=com.cloud.storage.resource.PremiumSecondaryStorageResource
> > instance=SecStorage sslcopy=false role=templateProcessor mtu=1500
> > eth2ip=175.107.206.196 eth2mask=255.255.255.0 gateway=175.107.206.1
> > public.network.device=eth2 eth0ip=169.254.1.184 eth0mask=255.255.0.0
> > eth1ip=172.16.2.56 eth1mask=255.255.255.192 mgmtcidr=172.16.0.0/26
> > localgw=172.16.2.62 private.network.device=eth1 internaldns1=202.163.96.3
> > internaldns2=202.163.96.4 dns1=202.163.96.3 dns2=202.163.96.4
> > nfsVersion=null
> >    2020-09-10 22:58:48,399 DEBUG [c.c.h.x.r.CitrixResourceBase]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) Failed to find SR by name
> > 'XenServer Tools', will try to find 'XCP-ng Tools' SR
> >    2020-09-10 22:58:48,400 WARN
> [c.c.h.x.r.w.x.CitrixStartCommandWrapper]
> > (DirectAgent-270:ctx-3468556a) (logid:4cc5809d) Catch Exception: class
> > com.cloud.utils.exception.CloudRuntimeException due to
> > com.cloud.utils.exception.CloudRuntimeException: There are 0 SRs with
> name
> > XenServer Tools
> > com.cloud.utils.exception.CloudRuntimeException: There are 0 SRs with
> name
> > XenServer Tools
> > at
> >
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.createPatchVbd(CitrixResourceBase.java:1061)
> >
> > Your SSVM can not even start, leave alone trying to access it etc.
> > Also, after the upgrade, I saw that the public SSH key was successfully
> > injected into the systemvm.iso, and that ISO is copied to all hosts, as
> > well as the current id_rsa private key - so assuming your SSVM starts
> > sucessfully, you should be able to access it, as a) you have the id_rsa
> on
> > each XS host, and b) systemvm.iso does contain a good public_key(inside
> the
> > authorized_keys file) - so assuming netwotk is UP (169.254...), you
> should
> > be able to access it via ssh on port 3922.
> >
> > Best,
> >
> >
> >
> > On Mon, 14 Sep 2020 at 11:39, Ammad Syed <sy...@gmail.com> wrote:
> >
> >> You guys can check job-1248274 logs or SSVM s-25457-VM that has
> >> successfully connected agent state.
> >>
> >> Ammad Ali
> >>
> >> On Mon, Sep 14, 2020 at 2:33 PM Ammad Syed <sy...@gmail.com>
> wrote:
> >>
> >> > Here are the upgrade management server logs.
> >> >
> >> >
> >> >
> >>
> https://drive.google.com/file/d/17cxh7f-24ibnXCKvUUPqt3p1YTzlMQN8/view?usp=sharing
> >> >
> >> > Ammad Ali
> >> >
> >> > On Mon, Sep 14, 2020 at 2:17 PM Ammad Syed <sy...@gmail.com>
> >> wrote:
> >> >
> >> >> In addition to previous email there is only one host in one zone
> where
> >> >> systemVM agent goes up and on all other hosts on that zone agent
> >> failed.
> >> >>
> >> >> If you guys need, I can provide management server logs as well.
> >> >>
> >> >> Also is there a way to enable debugging in ACS logs to specifically
> >> find
> >> >> out where the problem is?
> >> >>
> >> >> Ammad Ali
> >> >>
> >> >> On Mon, Sep 14, 2020 at 10:39 AM Ammad Syed <sy...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> Hi Perl,
> >> >>>
> >> >>> I have taken those steps and verified that systemvm.iso is copied to
> >> all
> >> >>> hosts in all zones.
> >> >>>
> >> >>> I have recreated the systemvm and ssh to that host and checked the
> >> >>> md5sum of iso there and on acs. Both were same.
> >> >>>
> >> >>> However the md5sum on which systemvm was working has also same
> md5sum
> >> of
> >> >>> systemvm iso. The iso is getting copied successfully. The problem
> >> looks
> >> >>> somewhere else.
> >> >>>
> >> >>> I have checked in xenserver logs as well but didn’t find any logs
> that
> >> >>> something has failed.
> >> >>>
> >> >>> Ammad
> >> >>> Sent from my iPhone
> >> >>>
> >> >>> > On 14-Sep-2020, at 9:15 AM, Pearl d'Silva <> >> pearl.dsilva@shapeblue.com>
> >> >>> wrote:
> >> >>> >
> >> >>> > Hi Ammad,
> >> >>> >
> >> >>> > Is the understanding right that the steps as mentioned by you in
> the
> >> >>> previous mail has in-fact worked on one zone? If that's the case,
> >> could you
> >> >>> please ensure that all the hosts in all the other zones have the new
> >> >>> systemVM iso copied into them by checking the timestamps as well and
> >> >>> comparing the checksums against the iso on the management server, so
> >> that
> >> >>> when the system VM's are recreated, they pick up the new iso.
> >> >>> >
> >> >>> > Thanks,
> >> >>> > Pearl
> >> >>> >
> >> >>> > ________________________________
> >> >>> > From: Ammad Syed <sy...@gmail.com>
> >> >>> > Sent: Sunday, September 13, 2020 12:49 PM
> >> >>> > To: users@cloudstack.apache.org <us...@cloudstack.apache.org>
> >> >>> > Subject: Re: Cloudstack 4.11.3 to 4.13.1 SystemVMs Error
> >> >>> >
> >> >>> > Hi Andrija,
> >> >>> >
> >> >>> > Here is the permission of mount point in SSVM.
> >> >>> >
> >> >>> > root@s-25437-VM:~# df -h
> >> >>> > Filesystem               Size  Used Avail Use% Mounted on
> >> >>> > udev                     233M     0  233M   0% /dev
> >> >>> > tmpfs                     98M   19M   80M  19% /run
> >> >>> > /dev/xvda5               1.1G  773M  282M  74% /
> >> >>> > tmpfs                    244M     0  244M   0% /dev/shm
> >> >>> > tmpfs                    5.0M     0  5.0M   0% /run/lock
> >> >>> > tmpfs                    244M     0  244M   0% /sys/fs/cgroup
> >> >>> > /dev/xvda1                92M   35M   57M  39% /boot
> >> >>> > /dev/xvda6               435M   21M  410M   5% /var
> >> >>> > /dev/xvda7                75M  1.6M   72M   3% /tmp
> >> >>> > 172.16.10.35:/nfs/KHI02   12T  7.0T  5.1T  59%
> >> >>> > /mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092
> >> >>> > tmpfs                     49M     0   49M   0% /run/user/0
> >> >>> > root@s-25437-VM:~#
> >> >>> > root@s-25437-VM:~#
> >> >>> > root@s-25437-VM:~# cd
> >> >>> /mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092
> >> >>> > root@s-25437-VM
> >> :/mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092#
> >> >>> > root@s-25437-VM
> >> :/mnt/SecStorage/8ea71ccb-e493-3c7e-8bb0-97871f5c2092#
> >> >>> ls
> >> >>> > -lah
> >> >>> > total 12K
> >> >>> > drwxrwxrwx  5 root root   70 Dec 21  2018 .
> >> >>> > drwxrwxrwx  3 root root 4.0K Sep 10 23:12 ..
> >> >>> > drwxrwxrwx 52 root root 4.0K Aug 13 19:37 snapshots
> >> >>> > drwxrwxrwx  3 root root   26 Jun  7  2013 template
> >> >>> > drwxrwxrwx 98 root root 4.0K Sep  1 12:52 volumes
> >> >>> >
> >> >>> > -Ammad Ali
> >> >>> >
> >> >>> >> On Fri, Sep 11, 2020 at 6:09 PM Andrija Panic <> >> >>> andrija.panic@gmail.com>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> Can you share permissions of your secondary storage (mount it
> then
> >> ls
> >> >>> -lah
> >> >>> >> the mount point)
> >> >>> >>
> >> >>> >
> >> >>> > pearl.dsilva@shapeblue.com
> >> >>> > www.shapeblue.com
> >> >>> > 3 London Bridge Street,  3rd floor, News Building, London  SE1
> 9SGUK
> >> >>> > @shapeblue
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> >>> On Fri, 11 Sep 2020 at 01:39, Ammad Syed <sy...@gmail.com> >
> >> >>> wrote:
> >> >>> >>>
> >> >>> >>> Hi Andrija,
> >> >>> >>>
> >> >>> >>> I have performed an upgrade on my production system from 4.11.3
> to
> >> >>> >> 4.13.1.
> >> >>> >>> Even I cleared the tags but the issue is still there.
> >> >>> >>>
> >> >>> >>> I have four zones and only in one zone and on specific host, the
> >> >>> >>> systemVM's agent goes up but on all other zones, key injection
> to
> >> >>> >> systemVM
> >> >>> >>> is still failing on all zones and PODs. I have checked, the
> >> updated
> >> >>> ISO
> >> >>> >> is
> >> >>> >>> there on all hosts. The md5sum of systemvm.iso is same on xen
> >> hosts
> >> >>> and
> >> >>> >> acs
> >> >>> >>> host.
> >> >>> >>>
> >> >>> >>> Look like a weird problem. How can I troubleshoot this further ?
> >> Any
> >> >>> >> advise
> >> >>> >>> would be appreciated.
> >> >>> >>>
> >> >>> >>> -Ammad
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>> >> --
> >> >>> >>
> >> >>> >> Andrija Panić
> >> >>> >>
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Regards,
> >> >>> >
> >> >>> >
> >> >>> > Syed Ammad Ali
> >> >>>
> >> >>
> >> >>
> >> >> --
> >> >> Regards,
> >> >>
> >> >>
> >> >> Syed Ammad Ali
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards,
> >> >
> >> >
> >> > Syed Ammad Ali
> >> >
> >>
> >>
> >> --
> >> Regards,
> >>
> >>
> >> Syed Ammad Ali
> >>
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
>
> Andrija Panić
>


-- 
Regards,


Syed Ammad Ali



</s...@cloudstack.apache.org>