You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by "Pascal R." <re...@gmail.com> on 2016/07/04 11:25:52 UTC

CS 4.8 VMware - Virtual Router stuck at starting

hi,

we have a CS4.8 deployment with VMWare 5.5.

When trying to launch the first VM, the VS is created. VS starts up, but in
CS, it stuck with "starting" state.

i can't find any usefull information in the logs.

any hint?

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Jacob Seeley <js...@vertitechit.com>.
Hello,

I second this problem. I'm a new CloudStack user. I'm using CloudStack 4.8 on CentOS Linux release 7.2.1511 (Core). My hypervisor is VMware (vCenter/ESXi 6). I can successfully deploy a new basic zone with a pod, cluster, and host with primary and secondary storage. I'm using the 4.6 SystemVM Template (systemvm64template-4.6.0-vmware.ova). Both the Console Proxy and Secondary Storage System VMs get deployed and show as Running in CloudStack. However, when I deploy my first instance (either from a template or ISO) and the Virtual Router gets deployed, CloudStack continues to report the Virtual Router as Starting even though through vSphere I can see it as up and running.

CloudStack reports that the Virtual Router requires an upgrade. When I do this, all that happens is the Virtual Router powers down and then starts again.

Restarting the CloudStack service or the Virtual Router or destroying and recreating the Virtual Router does not help. 

At this point, I'm dead in the water. I've tried using CloudStack 4.5 on CentOS 7 but had no luck starting the service. I was able to install and configure CloudStack 4.5 on Centos 6 however it us unable to connect to vCenter 6.

Any help or suggestions would be greatly appreciated.

Thank you,

Jacob Seeley
Sr. Infrastructure Engineer
VertitechIT
413-268-1631

www.vertitechit.com

-----Original Message-----
From: Glenn Wagner [mailto:glenn.wagner@shapeblue.com] 
Sent: Tuesday, July 5, 2016 2:36 AM
To: users@cloudstack.apache.org
Subject: RE: CS 4.8 VMware - Virtual Router stuck at starting

Hi,

Just to confirm are you using basic networking or advanced networking?

Glenn
 

glenn.wagner@shapeblue.com
www.shapeblue.com
2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town  7130South Africa @shapeblue
  
 


-----Original Message-----
From: Darren Tang [mailto:darrentang.dt@gmail.com]
Sent: Tuesday, 05 July 2016 5:16 AM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

https://issues.apache.org/jira/browse/CLOUDSTACK-9144

2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:

> Hi,
>
> What template are you using to start your first VM? - the default 
> vmware template?
> If you look in vcenter , what does the console show you ?
>
>
> Glenn
>
>
>
> glenn.wagner@shapeblue.com
> www.shapeblue.com
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
> 7130South Africa @shapeblue
>
>
>
>
> -----Original Message-----
> From: Pascal R. [mailto:repa182@gmail.com]
> Sent: Monday, 04 July 2016 1:26 PM
> To: users@cloudstack.apache.org
> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>
> hi,
>
> we have a CS4.8 deployment with VMWare 5.5.
>
> When trying to launch the first VM, the VS is created. VS starts up, 
> but in CS, it stuck with "starting" state.
>
> i can't find any usefull information in the logs.
>
> any hint?
>

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Glenn Wagner <gl...@shapeblue.com>.
Hi,

Just to confirm are you using basic networking or advanced networking?

Glenn
 

glenn.wagner@shapeblue.comĀ 
www.shapeblue.com
2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town  7130South Africa
@shapeblue
  
 


-----Original Message-----
From: Darren Tang [mailto:darrentang.dt@gmail.com] 
Sent: Tuesday, 05 July 2016 5:16 AM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

https://issues.apache.org/jira/browse/CLOUDSTACK-9144

2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:

> Hi,
>
> What template are you using to start your first VM? - the default 
> vmware template?
> If you look in vcenter , what does the console show you ?
>
>
> Glenn
>
>
>
> glenn.wagner@shapeblue.com
> www.shapeblue.com
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
> 7130South Africa @shapeblue
>
>
>
>
> -----Original Message-----
> From: Pascal R. [mailto:repa182@gmail.com]
> Sent: Monday, 04 July 2016 1:26 PM
> To: users@cloudstack.apache.org
> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>
> hi,
>
> we have a CS4.8 deployment with VMWare 5.5.
>
> When trying to launch the first VM, the VS is created. VS starts up, 
> but in CS, it stuck with "starting" state.
>
> i can't find any usefull information in the logs.
>
> any hint?
>

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Jacob Seeley <js...@vertitechit.com>.
Thank you ilya. I thought I tried 4.5 as part of my troubleshooting and had an issue using it with vCenter 6 though now I can't recall for sure. I will try that again.

Jacob Seeley
Sr. Infrastructure Engineer
VertitechIT
413-268-1631

www.vertitechit.com

-----Original Message-----
From: ilya [mailto:ilya.mailing.lists@gmail.com] 
Sent: Friday, July 29, 2016 2:01 PM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

Jacob

So you are setting a basic zone, which means you have a single network for both hypervisors and guest VMs (or everything). In that case - the control network should 0.0.0.0 - since there are no other networks. I was under assumption you are using advanced zone - but its more clear now.

I'd suggest you start off with 4.5 for the time being - while we raise a blocker issue for 4.8 (and possibly 4.9).

I'll see if i can spend some cycles to investigate this.

Regards
ilya



On 7/29/16 7:20 AM, Jacob Seeley wrote:
> ilya,
> 
> I'm using a Basic zone. Here is the workflow I'm using with actual IP addresses. Any fields I left out you can assume I leave blank.
> 
> Add Zone
> 
> Zone Type: Basic
> Name: ZONE1
> IPv4 DNS: 10.70.116.20
> Internal DNS 1: 10.70.116.20
> Hypervisor: VMware
> Network Offering: DefaultSharedNetworkOffering
> 
> Physical Network
> 
> Management: vSwitch Name: vSwitch0
> Guest: vSwitch Name: vSwitch0
> 
> Pod name: POD1
> Reserved system gateway: 10.70.116.1
> Reserved system netmask: 255.255.255.0 Start Reserved system IP: 
> 10.70.116.60 End Reserved system IP: 10.70.116.79
> 
> Guest Gateway: 10.70.116.1
> Guest Netmask: 255.255.255.0
> Guest start IP: 10.70.116.80
> Guest end IP: 10.70.116.99
> 
> The rest is Storage and is probably irrelevant here.
> 
> After I go through the wizard of adding a zone, it asks me to enable it, which I do. Without any further action, 2 System VMs (Console and Secondary) are created. The default CentOS template is downloaded. Both System VMs receive 2x IP addresses, one on the Pod network and one on the guest network.
> 
> System Storage VM
> Public IP address: 10.70.116.81
> Private IP address: 10.70.116.73
> Gateway: 10.70.116.1
> 
> Console Proxy VM
> Public IP address: 10.70.116.80
> Private IP address: 10.70.116.74
> Gateway: 10.70.116.1
> 
> Only when I initiate my first VM from template (or even ISO) is a 
> Virtual Router deployed. Like mentioned before, it gets two NICS with the first one being of Traffic Type Guest and an IP address of 10.70.116.92 and a second NIC of Traffic Type Control and no IP address assigned (it reports 0.0.0.0). Ultimately the virtual router gets deployed on the hypervisor (VMware) but it's useless. The instance I tried deploying ultimately fails. I suspect that this is the problem or a problem. The virtual router gets an IP address on the guest network but not the management network.
> 
> Regarding the cloud agent/service (/etc/init.d/cloud) on the virtual 
> router. I mentioned earlier that I found that /etc/init.d/cloud on the 
> virtual router fails. I found this happens because 
> /usr/local/cloud/systemvm never gets populated on the virtual router. 
> Further down the rabbit hole I go, I see there is a script, 
> /opt/cloud/bin/patchsystemvm.sh, that is responsible for mounting the 
> systemvm.iso and unzipping the contents to /usr/local/cloud/systemvm. 
> Both System VMS (console and secondary) do this but not the virtual 
> router. From what I can tell, the reason for this as follows. If you 
> look at the script (found here: 
> https://github.com/apache/cloudstack/blob/master/systemvm/patches/debi
> an/config/opt/cloud/bin/patchsystemvm.sh)
> 
> There is a function of the script called patch_console_proxy. This 
> function gets called only if the following is satisfied: if [ "$TYPE" 
> == "consoleproxy" ] || [ "$TYPE" == "secstorage" ] && [ -f 
> ${PATCH_MOUNT}/systemvm.zip ]
> 
> I've noticed that the value for TYPE in every case I've tried this with the virtual router is equal to dhcpsrv. According to that script, the function that gets called for TYPE=dhcpdsrv is dhcpsrvr_svcs. That function does the following:
> 
> dhcpsrvr_svcs() {
>    chkconfig cloud off
>    chkconfig cloud-passwd-srvr on ; 
>    chkconfig haproxy off ; 
>    chkconfig dnsmasq on
>    chkconfig ssh on
>    chkconfig nfs-common off
>    chkconfig portmap off
>    chkconfig keepalived off
>    chkconfig conntrackd off
>    echo "ssh dnsmasq cloud-passwd-srvr apache2" > /var/cache/cloud/enabled_svcs
>    echo "cloud nfs-common haproxy portmap" > 
> /var/cache/cloud/disabled_svcs }
> 
> Here you can see that it turns off the cloud service. As far as I can tell, my system router is executing this function, so this is expected behavior. This tells me that the service cloud is to never run when TYPE=dhcpsrv.
> 
> As you mentioned before, I've since tried manually assigning an IP address NIC1 on the virtual router but it doesn't seem to help or do anything.
> 
> Without ever having a working setup of CloudStack before, it makes it harder for me to debug the issue since I'm not always sure how something should work. Unfortunately, I have no choice but to try and make this work on the VMware hypervisor. I think what I'm going to do now is setup a KVM hypervisor and see if I can get this working using the same software versions and workflow.
> 
> Thank you for your help.
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: ilya [mailto:ilya.mailing.lists@gmail.com]
> Sent: Friday, July 29, 2016 2:43 AM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> Daren
> 
> I'm also running 4.5.2 - and like the stability we get with it.
> 
> For the features we need, 4.5.2 - has everything that is required, so I dont see huge benefit of upgrading to latest ACS ATM. Also, our environments are very large and complex - so upgrade is not something I can take lightly.
> 
> With that said, i do have a small 8 node Lab environment i can try the upgrade on, it consists of 4 ESXi and 4 KVM nodes - so it should be a fair test.
> 
> Lets wait for Jacob to respond with his test of setting up IP/Netmask for eth1 router vm, if it does not help, i'll try to upgrade to see if i can reproduce the issue.
> 
> Regards
> ilya
> 
> On 7/28/16 9:43 PM, Darren Tang wrote:
>> Hi ilya:
>>  I can confirm that issus,  please check :
>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>  When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in 
>> basic zone,  The VR is nerver leaves the "starting" state.  fell back 
>> to 4.5 is fine.
>>  Maybe you can test it by yourself.
>>
>> 2016-07-29 3:24 GMT+08:00 ilya <il...@gmail.com>:
>>
>>> I guess it would help to know what type of zone you use?
>>>
>>> Is it advanced, isolated vpc or shared network? what type of isolation?
>>> or perhaps basic zone?
>>>
>>> Lastly, try stopping the iptables and restarting cloud agent (via 
>>> stop and start)
>>>
>>> Please see my response in-line
>>>
>>> On 7/28/16 6:58 AM, Jacob Seeley wrote:
>>>> Hi ilya,
>>>>
>>>> Funny you brought up debugging the router VM. After I responding
>>> yesterday, I did just that and I did find some odd things.
>>>> Just to be clear (I think we're on the same page), since I'm not 
>>>> the OP
>>> of this thread, the virtual router always gets deployed and it 
>>> starts up just fine; however, CloudStack reports that it's always stuck in starting.
>>> VMs that get deployed ultimately fail. CloudStack reports the router 
>>> version as UNKNOWN.
>>>> Before I provide what I found debugging the router VM, I'll address 
>>>> some
>>> of your points.
>>>>
>>>> ### FOLLOW-UP QUESTIONS ###
>>>>
>>>> " Another reason would be an issue of hypervisor accessing the NFS 
>>>> mount
>>> used for secondary storage."
>>>> I don't believe this is an issue. The hypervisor (VMware) does 
>>>> mount the
>>> secondary storage via NFS just fine. If this were an issue, I would 
>>> think the Secondary Storage and Console VMs would not deploy.
>>>>
>>>> " Use console of vCenter to see what is happening on router vm. You 
>>>> can
>>> login locally with root/password and see the content of 
>>> /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you..."
>>>> It looks like to me that /var/log/cloud.out is only logged to when
>>> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
>>> As such, there isn't even a file for /var/log/cloud.out. Even when I 
>>> set that variable, I never get anything logged to /var/log/cloud.out.
>>> However, there is a /var/log/cloud.log. Here is the contents of that:
>>> http://pastebin.com/aaTsRKZE
>>>>
>>>> " you can also run /etc/init.d/cloud stop and start.. that will 
>>>> give you
>>> a fresh start on logs.."
>>>> The service is in a failed state. It's worth noting that this 
>>>> service is
>>> in a started state on the Console and Secondary Storage VMs.
>>>
>>> this is concerning - see you did "sh -x", read on..
>>>
>>>>
>>>> " also, confirm that management server can talk to VR on POD IP
>>>> (management) on port 3922.."
>>>> It appears this is not an issue; see below:
>>>
>>> 3922 from MS to VR - this is the SSH daemon on VR with private key
>>> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>>>
>>>
>>>>
>>>> root@r-4-VM:~# telnet 10.70.110.101 8250 Trying 10.70.110.101...
>>>> Connected to 10.70.110.101.
>>>> Escape character is '^]'.
>>>>
>>>
>>>
>>>> ### ROUTE VM DEBUG ###
>>>>
>>>> Here is what I found with router VM gets deployed (please tell me 
>>>> if
>>> anything seems off):
>>>> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an 
>>>> IP
>>> address coming from the defaultGuestNetwork. NIC2 is traffic type 
>>> Control but has an IP address of 0.0.0.0
>>>
>>> It is an issue for concern to see 0.0.0.0 assigned to eth1
>>>
>>> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>>>
>>> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. 
>>> This IP should be coming from the POD network range -> when you 
>>> added a pod - i assume you did it as part of Add Zone wizard...
>>>
>>> To see the PODIP range, goto UI
>>> Infrastructure, Zones, Your Zone, Physical Network, Physical Network
>>> 1 (assume you did not create anything special), Management, IP 
>>> Ranges
>>> -> you should see a range defined there and it should not be 0.0.0.0...
>>>
>>>> From the CloudStack management server, I cannot SSH into the router 
>>>> VM
>>> on NIC1. I've found this is because of iptables rules on the router 
>>> VM. If I issue a /etc/init.d/iptables-persistent flush on the router 
>>> VM, I can SSH into the router VM using the SSH key at port 3922.
>>>> The service "cloud" is in a failed state. Looking at the cloud init
>>> script, I see the following:
>>>>
>>>> CMDLINE=$(cat /var/cache/cloud/cmdline)
>>>>
>>>> TYPE="router"
>>>> for i in $CMDLINE
>>>>   do
>>>>     # search for foo=bar pattern and cut out foo
>>>>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>>>>     case $FIRSTPATTERN in
>>>>       type)
>>>>           TYPE=$(echo $i | cut -d= -f2)
>>>>       ;;
>>>>     esac
>>>> done
>>>>
>>>> The file cat /var/cache/cloud/cmdline exist; here are the contents:
>>>>
>>>> template=domP name=r-4-VM eth0ip=10.70.116.75 
>>>> eth0mask=255.255.255.0
>>> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
>>> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr 
>>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 
>>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qld
>>> O vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep3
>>> 7 aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> host=10.70.110.101 port=8080
>>> nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>>
>>>
>>>
>>> You can also try updating your  /var/cache/cloud/cmdline with proper 
>>> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under 
>>> Infrastructure, Routers, r-4, Nics and look for control nic..
>>>
>>> Then try starting the cloud service..
>>>
>>> Also, did you enable baremetal support? can you deploy a zone 
>>> without baremetal support? Perhaps there is a bug on how IPs are 
>>> assigned to
>>> eth1 (control nic)...
>>>
>>>
>>>> The previous code suggests that the value of TYPE starts as router 
>>>> but
>>> will get set to dhcpsrvr, as indicated by the contents of 
>>> /var/cache/cloud/cmdline. Is this normal?
>>>> Further down the script, I see:
>>>>
>>>> CLOUDSTACK_HOME="/usr/local/cloud"
>>> <----------------------------------------Exists
>>>> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
>>> <----------------------------------------Does not exist. Seems odd!
>>>> then
>>>>   . $CLOUDSTACK_HOME/systemvm/utils.sh
>>>> else
>>>>   _failure
>>>> fi
>>>>
>>>> # mkdir -p /var/log/vmops
>>>>
>>>> start() {
>>>>    local pid=$(get_pids)
>>>>    if [ "$pid" != "" ]; then
>>>>        echo "CloudStack cloud sevice is already running, PID = $pid"
>>>>        return 0
>>>>    fi
>>>>
>>>>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>>>>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
>>> <------------------------------------------------------Does not exist.
>>> Seems odd!
>>>>    then
>>>>      if [ "$pid" == "" ]
>>>>      then
>>>>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
>>>>        pid=$(get_pids)
>>>>        echo $pid > /var/run/cloud.pid
>>>>      fi
>>>>      _success
>>>>    else
>>>>      _failure
>>>>    fi
>>>>    echo
>>>>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
>>>> }
>>>>
>>>> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
>>> exists; however, the script then looks for the file 
>>> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also 
>>> looks is supposed to start the script run.sh but that also doesn't 
>>> exist. This seems like a problem to me.
>>>> Here you can see step through when I try to start the cloud service:
>>>>
>>>> sh -x /etc/init.d/cloud start
>>>> + ENABLED=0
>>>> + [ -e /etc/default/cloud ]
>>>> + . /etc/default/cloud
>>>> + ENABLED=0
>>>> + cat /var/cache/cloud/cmdline
>>>> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
>>> eth0mask=255.255.255.0 gateway=10.70.116.1 
>>> domain=vit.vertitechit.com
>>> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 
>>> mgmtcidr=
>>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr 
>>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 
>>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qld
>>> O vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep3
>>> 7 aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> host=10.70.110.101 port=8080
>>> nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>> + [ ! -z ]
>>>> + LOG_FILE=/dev/null
>>>> + TYPE=router
>>>> + cut -d= -f1
>>>> + echo template=domP
>>>> + FIRSTPATTERN=template
>>>> + cut -d= -f1
>>>> + echo name=r-4-VM
>>>> + FIRSTPATTERN=name
>>>> + cut -d= -f1
>>>> + echo eth0ip=10.70.116.75
>>>> + FIRSTPATTERN=eth0ip
>>>> + cut -d= -f1
>>>> + echo eth0mask=255.255.255.0
>>>> + FIRSTPATTERN=eth0mask
>>>> + cut -d= -f1
>>>> + echo gateway=10.70.116.1
>>>> + FIRSTPATTERN=gateway
>>>> + cut -d= -f1
>>>> + echo domain=vit.vertitechit.com
>>>> + FIRSTPATTERN=domain
>>>> + cut -d= -f1
>>>> + echo cidrsize=24
>>>> + FIRSTPATTERN=cidrsize
>>>> + cut -d= -f1
>>>> + echo dhcprange=10.70.116.1
>>>> + FIRSTPATTERN=dhcprange
>>>> + cut -d= -f1
>>>> + echo eth1ip=0.0.0.0
>>>> + FIRSTPATTERN=eth1ip
>>>> + cut -d= -f1
>>>> + echo eth1mask=0.0.0.0
>>>> + FIRSTPATTERN=eth1mask
>>>> + cut -d= -f1
>>>> + echo mgmtcidr=10.70.110.0/24
>>>> + FIRSTPATTERN=mgmtcidr
>>>> + cut -d= -f1
>>>> + echo localgw=10.70.116.1
>>>> + FIRSTPATTERN=localgw
>>>> + cut -d= -f1
>>>> + echo sshonguest=true
>>>> + FIRSTPATTERN=sshonguest
>>>> + cut -d= -f1
>>>> + echo type=dhcpsrvr
>>>> + FIRSTPATTERN=type
>>>> + cut -d= -f2
>>>> + echo type=dhcpsrvr
>>>> + TYPE=dhcpsrvr
>>>> + cut -d= -f1
>>>> + echo disable_rp_filter=true
>>>> + FIRSTPATTERN=disable_rp_filter
>>>> + cut -d= -f1
>>>> + echo extra_pubnics=2
>>>> + FIRSTPATTERN=extra_pubnics
>>>> + cut -d= -f1
>>>> + echo dns1=10.70.10.21
>>>> + FIRSTPATTERN=dns1
>>>> + cut -d= -f1
>>>> + echo
>>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qld
>>> O vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>>> + FIRSTPATTERN=baremetalnotificationsecuritykey
>>>> + cut -d= -f1
>>>> + echo
>>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep3
>>> 7 aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>>> + FIRSTPATTERN=baremetalnotificationapikey
>>>> + cut -d= -f1
>>>> + echo host=10.70.110.101
>>>> + FIRSTPATTERN=host
>>>> + cut -d= -f1
>>>> + echo port=8080
>>>> + FIRSTPATTERN=port
>>>> + cut -d= -f1
>>>> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>> + FIRSTPATTERN=nic_macs
>>>> + [ -f /etc/init.d/functions ]
>>>> + [ -f ./lib/lsb/init-functions ]
>>>> + RETVAL=0
>>>> + CLOUDSTACK_HOME=/usr/local/cloud
>>>> + [ -f /usr/local/cloud/systemvm/utils.sh ] _failure [ -f 
>>>> + /etc/init.d/functions ] echo Failed
>>>> Failed
>>>> + [ 0 != 0 ]
>>>> + exit 0
>>>>
>>>> Thoughts?
>>>>
>>>> Jacob Seeley
>>>> Sr. Infrastructure Engineer
>>>> VertitechIT
>>>> 413-268-1631
>>>>
>>>> www.vertitechit.com
>>>>
>>>> -----Original Message-----
>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>> Sent: Wednesday, July 27, 2016 8:43 PM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> Hi Jacob
>>>>
>>>> I gave this a second read - if your issue is Router VM in starting 
>>>> mode
>>>> - but not started - it means cloudstack agent on routerVM cannot 
>>>> talk to
>>> management server on 8250 over POD network.
>>>>
>>>> Another reason would be an issue of hypervisor accessing the NFS 
>>>> mount
>>> used for secondary storage.
>>>>
>>>> Use console of vCenter to see what is happening on router vm. You 
>>>> can
>>> login locally with root/password and see the content of 
>>> /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you...
>>>>
>>>> you can also run /etc/init.d/cloud stop and start.. that will give 
>>>> you a
>>> fresh start on logs..
>>>>
>>>> also, confirm that management server can talk to VR on POD IP
>>>> (management) on port 3922..
>>>>
>>>> Regards
>>>> ilya
>>>>
>>>> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>>>>> ilya,
>>>>>
>>>>> Here are the contents of the secondary storage:
>>>>>
>>>>> .
>>>>> ./template
>>>>> ./template/tmpl
>>>>> ./template/tmpl/1
>>>>> ./template/tmpl/1/8
>>>>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>>>>> ./template/tmpl/1/8/template.properties
>>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0
>>>>> -
>>>>> vmw
>>>>> are.ovf
>>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0
>>>>> -
>>>>> vmw
>>>>> are-disk3.vmdk
>>>>> ./template/tmpl/1/7
>>>>> ./template/tmpl/1/7/template.properties
>>>>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>>>>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>>>>> ./systemvm
>>>>> ./systemvm/systemvm-4.8.0.1.iso
>>>>> ./systemvm/.lck-bf162a0100000000
>>>>> ./snapshots
>>>>> ./volumes
>>>>>
>>>>> I've noticed that both the Secondary Storage VM and Console Proxy 
>>>>> VM
>>> mount this ISO and as stated before, they come up just fine.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jacob Seeley
>>>>> Sr. Infrastructure Engineer
>>>>> VertitechIT
>>>>> 413-268-1631
>>>>>
>>>>> www.vertitechit.com
>>>>>
>>>>> -----Original Message-----
>>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>>> Sent: Wednesday, July 27, 2016 3:22 AM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> Jacob
>>>>>
>>>>> The upgrade usually occurs though systemvm.iso - that is generated 
>>>>> by
>>> cloudstack on the first start.
>>>>>
>>>>> Please show the content of your secondary store specifically
>>>>>
>>>>> /mnt/[secondary-storage]/systemvm
>>>>>
>>>>> Regards
>>>>> ilya
>>>>>
>>>>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>>>>> Here is a pastebin snippet the management-server.log - 
>>>>>> http://pastebin.com/GCLm53Gz
>>>>>>
>>>>>> Hopefully the relevant data is in there.
>>>>>>
>>>>>> I made sure to start from scratch for this example. Everything 
>>>>>> from
>>> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack 
>>> install is fresh. I deployed a new instance in CloudStack, a VM 
>>> internally named i-2-3-VM with an IP address of 192.168.0.78. This 
>>> prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Jacob Seeley
>>>>>> Sr. Infrastructure Engineer
>>>>>> VertitechIT
>>>>>> 413-268-1631
>>>>>>
>>>>>> www.vertitechit.com
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>>>>> Sent: Monday, July 25, 2016 1:37 AM
>>>>>> To: users@cloudstack.apache.org
>>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>
>>>>>> please upload the logs in the issue.
>>>>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang 
>>>>>>> <da...@gmail.com>
>>> wrote:
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>>>>
>>>>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> What template are you using to start your first VM? - the 
>>>>>>>> default vmware template?
>>>>>>>> If you look in vcenter , what does the console show you ?
>>>>>>>>
>>>>>>>>
>>>>>>>> Glenn
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> glenn.wagner@shapeblue.com
>>>>>>>> www.shapeblue.com
>>>>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape 
>>>>>>>> Town 7130South Africa @shapeblue
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>>>>> To: users@cloudstack.apache.org
>>>>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>>>
>>>>>>>> hi,
>>>>>>>>
>>>>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>>>>
>>>>>>>> When trying to launch the first VM, the VS is created. VS 
>>>>>>>> starts up, but in CS, it stuck with "starting" state.
>>>>>>>>
>>>>>>>> i can't find any usefull information in the logs.
>>>>>>>>
>>>>>>>> any hint?
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> DISCLAIMER
>>>>>> ==========
>>>>>> This e-mail may contain privileged and confidential information 
>>>>>> which
>>> is the property of Accelerite, a Persistent Systems business. It is 
>>> intended only for the use of the individual or entity to which it is 
>>> addressed. If you are not the intended recipient, you are not 
>>> authorized to read, retain, copy, print, distribute or use this 
>>> message. If you have received this communication in error, please 
>>> notify the sender and delete all copies of this message. Accelerite, 
>>> a Persistent Systems business does not accept any liability for virus infected mails.
>>>>>>
>>>
>>

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by ilya <il...@gmail.com>.
Jacob

So you are setting a basic zone, which means you have a single network
for both hypervisors and guest VMs (or everything). In that case - the
control network should 0.0.0.0 - since there are no other networks. I
was under assumption you are using advanced zone - but its more clear now.

I'd suggest you start off with 4.5 for the time being - while we raise a
blocker issue for 4.8 (and possibly 4.9).

I'll see if i can spend some cycles to investigate this.

Regards
ilya



On 7/29/16 7:20 AM, Jacob Seeley wrote:
> ilya,
> 
> I'm using a Basic zone. Here is the workflow I'm using with actual IP addresses. Any fields I left out you can assume I leave blank.
> 
> Add Zone
> 
> Zone Type: Basic
> Name: ZONE1
> IPv4 DNS: 10.70.116.20
> Internal DNS 1: 10.70.116.20
> Hypervisor: VMware
> Network Offering: DefaultSharedNetworkOffering
> 
> Physical Network
> 
> Management: vSwitch Name: vSwitch0
> Guest: vSwitch Name: vSwitch0
> 
> Pod name: POD1
> Reserved system gateway: 10.70.116.1
> Reserved system netmask: 255.255.255.0
> Start Reserved system IP: 10.70.116.60
> End Reserved system IP: 10.70.116.79
> 
> Guest Gateway: 10.70.116.1
> Guest Netmask: 255.255.255.0
> Guest start IP: 10.70.116.80
> Guest end IP: 10.70.116.99
> 
> The rest is Storage and is probably irrelevant here.
> 
> After I go through the wizard of adding a zone, it asks me to enable it, which I do. Without any further action, 2 System VMs (Console and Secondary) are created. The default CentOS template is downloaded. Both System VMs receive 2x IP addresses, one on the Pod network and one on the guest network.
> 
> System Storage VM
> Public IP address: 10.70.116.81
> Private IP address: 10.70.116.73
> Gateway: 10.70.116.1
> 
> Console Proxy VM
> Public IP address: 10.70.116.80
> Private IP address: 10.70.116.74
> Gateway: 10.70.116.1
> 
> Only when I initiate my first VM from template (or even ISO) is a Virtual Router deployed. Like mentioned before, it gets two NICS with the first one being of Traffic Type Guest and an IP address of 10.70.116.92 and a second NIC of Traffic Type Control and no IP address assigned (it reports 0.0.0.0). Ultimately the virtual router gets deployed on the hypervisor (VMware) but it's useless. The instance I tried deploying ultimately 
> fails. I suspect that this is the problem or a problem. The virtual router gets an IP address on the guest network but not the management network.
> 
> Regarding the cloud agent/service (/etc/init.d/cloud) on the virtual router. I mentioned earlier that I found that /etc/init.d/cloud on the virtual router fails. I found this happens because /usr/local/cloud/systemvm never gets populated on the virtual router. Further down the rabbit hole I go, I see there is a script, /opt/cloud/bin/patchsystemvm.sh, that is responsible for mounting the systemvm.iso and unzipping the contents to /usr/local/cloud/systemvm. Both System VMS (console and secondary) do this but not the virtual router. From what I can tell, the reason for this as follows. If you look at the script (found here: https://github.com/apache/cloudstack/blob/master/systemvm/patches/debian/config/opt/cloud/bin/patchsystemvm.sh)
> 
> There is a function of the script called patch_console_proxy. This function gets called only if the following is satisfied: if [ "$TYPE" == "consoleproxy" ] || [ "$TYPE" == "secstorage" ] && [ -f ${PATCH_MOUNT}/systemvm.zip ]
> 
> I've noticed that the value for TYPE in every case I've tried this with the virtual router is equal to dhcpsrv. According to that script, the function that gets called for TYPE=dhcpdsrv is dhcpsrvr_svcs. That function does the following:
> 
> dhcpsrvr_svcs() {
>    chkconfig cloud off
>    chkconfig cloud-passwd-srvr on ; 
>    chkconfig haproxy off ; 
>    chkconfig dnsmasq on
>    chkconfig ssh on
>    chkconfig nfs-common off
>    chkconfig portmap off
>    chkconfig keepalived off
>    chkconfig conntrackd off
>    echo "ssh dnsmasq cloud-passwd-srvr apache2" > /var/cache/cloud/enabled_svcs
>    echo "cloud nfs-common haproxy portmap" > /var/cache/cloud/disabled_svcs
> } 
> 
> Here you can see that it turns off the cloud service. As far as I can tell, my system router is executing this function, so this is expected behavior. This tells me that the service cloud is to never run when TYPE=dhcpsrv.
> 
> As you mentioned before, I've since tried manually assigning an IP address NIC1 on the virtual router but it doesn't seem to help or do anything.
> 
> Without ever having a working setup of CloudStack before, it makes it harder for me to debug the issue since I'm not always sure how something should work. Unfortunately, I have no choice but to try and make this work on the VMware hypervisor. I think what I'm going to do now is setup a KVM hypervisor and see if I can get this working using the same software versions and workflow.
> 
> Thank you for your help.
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: ilya [mailto:ilya.mailing.lists@gmail.com] 
> Sent: Friday, July 29, 2016 2:43 AM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> Daren
> 
> I'm also running 4.5.2 - and like the stability we get with it.
> 
> For the features we need, 4.5.2 - has everything that is required, so I dont see huge benefit of upgrading to latest ACS ATM. Also, our environments are very large and complex - so upgrade is not something I can take lightly.
> 
> With that said, i do have a small 8 node Lab environment i can try the upgrade on, it consists of 4 ESXi and 4 KVM nodes - so it should be a fair test.
> 
> Lets wait for Jacob to respond with his test of setting up IP/Netmask for eth1 router vm, if it does not help, i'll try to upgrade to see if i can reproduce the issue.
> 
> Regards
> ilya
> 
> On 7/28/16 9:43 PM, Darren Tang wrote:
>> Hi ilya:
>>  I can confirm that issus,  please check :
>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>  When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in 
>> basic zone,  The VR is nerver leaves the "starting" state.  fell back 
>> to 4.5 is fine.
>>  Maybe you can test it by yourself.
>>
>> 2016-07-29 3:24 GMT+08:00 ilya <il...@gmail.com>:
>>
>>> I guess it would help to know what type of zone you use?
>>>
>>> Is it advanced, isolated vpc or shared network? what type of isolation?
>>> or perhaps basic zone?
>>>
>>> Lastly, try stopping the iptables and restarting cloud agent (via 
>>> stop and start)
>>>
>>> Please see my response in-line
>>>
>>> On 7/28/16 6:58 AM, Jacob Seeley wrote:
>>>> Hi ilya,
>>>>
>>>> Funny you brought up debugging the router VM. After I responding
>>> yesterday, I did just that and I did find some odd things.
>>>> Just to be clear (I think we're on the same page), since I'm not the 
>>>> OP
>>> of this thread, the virtual router always gets deployed and it starts 
>>> up just fine; however, CloudStack reports that it's always stuck in starting.
>>> VMs that get deployed ultimately fail. CloudStack reports the router 
>>> version as UNKNOWN.
>>>> Before I provide what I found debugging the router VM, I'll address 
>>>> some
>>> of your points.
>>>>
>>>> ### FOLLOW-UP QUESTIONS ###
>>>>
>>>> " Another reason would be an issue of hypervisor accessing the NFS 
>>>> mount
>>> used for secondary storage."
>>>> I don't believe this is an issue. The hypervisor (VMware) does mount 
>>>> the
>>> secondary storage via NFS just fine. If this were an issue, I would 
>>> think the Secondary Storage and Console VMs would not deploy.
>>>>
>>>> " Use console of vCenter to see what is happening on router vm. You 
>>>> can
>>> login locally with root/password and see the content of 
>>> /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you..."
>>>> It looks like to me that /var/log/cloud.out is only logged to when
>>> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
>>> As such, there isn't even a file for /var/log/cloud.out. Even when I 
>>> set that variable, I never get anything logged to /var/log/cloud.out. 
>>> However, there is a /var/log/cloud.log. Here is the contents of that:
>>> http://pastebin.com/aaTsRKZE
>>>>
>>>> " you can also run /etc/init.d/cloud stop and start.. that will give 
>>>> you
>>> a fresh start on logs.."
>>>> The service is in a failed state. It's worth noting that this 
>>>> service is
>>> in a started state on the Console and Secondary Storage VMs.
>>>
>>> this is concerning - see you did "sh -x", read on..
>>>
>>>>
>>>> " also, confirm that management server can talk to VR on POD IP
>>>> (management) on port 3922.."
>>>> It appears this is not an issue; see below:
>>>
>>> 3922 from MS to VR - this is the SSH daemon on VR with private key
>>> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>>>
>>>
>>>>
>>>> root@r-4-VM:~# telnet 10.70.110.101 8250 Trying 10.70.110.101...
>>>> Connected to 10.70.110.101.
>>>> Escape character is '^]'.
>>>>
>>>
>>>
>>>> ### ROUTE VM DEBUG ###
>>>>
>>>> Here is what I found with router VM gets deployed (please tell me if
>>> anything seems off):
>>>> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an 
>>>> IP
>>> address coming from the defaultGuestNetwork. NIC2 is traffic type 
>>> Control but has an IP address of 0.0.0.0
>>>
>>> It is an issue for concern to see 0.0.0.0 assigned to eth1
>>>
>>> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>>>
>>> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. 
>>> This IP should be coming from the POD network range -> when you added 
>>> a pod - i assume you did it as part of Add Zone wizard...
>>>
>>> To see the PODIP range, goto UI
>>> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 
>>> 1 (assume you did not create anything special), Management, IP Ranges 
>>> -> you should see a range defined there and it should not be 0.0.0.0...
>>>
>>>> From the CloudStack management server, I cannot SSH into the router 
>>>> VM
>>> on NIC1. I've found this is because of iptables rules on the router 
>>> VM. If I issue a /etc/init.d/iptables-persistent flush on the router 
>>> VM, I can SSH into the router VM using the SSH key at port 3922.
>>>> The service "cloud" is in a failed state. Looking at the cloud init
>>> script, I see the following:
>>>>
>>>> CMDLINE=$(cat /var/cache/cloud/cmdline)
>>>>
>>>> TYPE="router"
>>>> for i in $CMDLINE
>>>>   do
>>>>     # search for foo=bar pattern and cut out foo
>>>>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>>>>     case $FIRSTPATTERN in
>>>>       type)
>>>>           TYPE=$(echo $i | cut -d= -f2)
>>>>       ;;
>>>>     esac
>>>> done
>>>>
>>>> The file cat /var/cache/cloud/cmdline exist; here are the contents:
>>>>
>>>> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0
>>> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
>>> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr 
>>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 
>>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldO
>>> vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37
>>> aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> host=10.70.110.101 port=8080 
>>> nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>>
>>>
>>>
>>> You can also try updating your  /var/cache/cloud/cmdline with proper 
>>> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under 
>>> Infrastructure, Routers, r-4, Nics and look for control nic..
>>>
>>> Then try starting the cloud service..
>>>
>>> Also, did you enable baremetal support? can you deploy a zone without 
>>> baremetal support? Perhaps there is a bug on how IPs are assigned to
>>> eth1 (control nic)...
>>>
>>>
>>>> The previous code suggests that the value of TYPE starts as router 
>>>> but
>>> will get set to dhcpsrvr, as indicated by the contents of 
>>> /var/cache/cloud/cmdline. Is this normal?
>>>> Further down the script, I see:
>>>>
>>>> CLOUDSTACK_HOME="/usr/local/cloud"
>>> <----------------------------------------Exists
>>>> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
>>> <----------------------------------------Does not exist. Seems odd!
>>>> then
>>>>   . $CLOUDSTACK_HOME/systemvm/utils.sh
>>>> else
>>>>   _failure
>>>> fi
>>>>
>>>> # mkdir -p /var/log/vmops
>>>>
>>>> start() {
>>>>    local pid=$(get_pids)
>>>>    if [ "$pid" != "" ]; then
>>>>        echo "CloudStack cloud sevice is already running, PID = $pid"
>>>>        return 0
>>>>    fi
>>>>
>>>>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>>>>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
>>> <------------------------------------------------------Does not exist.
>>> Seems odd!
>>>>    then
>>>>      if [ "$pid" == "" ]
>>>>      then
>>>>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
>>>>        pid=$(get_pids)
>>>>        echo $pid > /var/run/cloud.pid
>>>>      fi
>>>>      _success
>>>>    else
>>>>      _failure
>>>>    fi
>>>>    echo
>>>>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
>>>> }
>>>>
>>>> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
>>> exists; however, the script then looks for the file 
>>> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also 
>>> looks is supposed to start the script run.sh but that also doesn't 
>>> exist. This seems like a problem to me.
>>>> Here you can see step through when I try to start the cloud service:
>>>>
>>>> sh -x /etc/init.d/cloud start
>>>> + ENABLED=0
>>>> + [ -e /etc/default/cloud ]
>>>> + . /etc/default/cloud
>>>> + ENABLED=0
>>>> + cat /var/cache/cloud/cmdline
>>>> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
>>> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com
>>> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 
>>> mgmtcidr=
>>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr 
>>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 
>>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldO
>>> vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37
>>> aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> host=10.70.110.101 port=8080 
>>> nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>> + [ ! -z ]
>>>> + LOG_FILE=/dev/null
>>>> + TYPE=router
>>>> + cut -d= -f1
>>>> + echo template=domP
>>>> + FIRSTPATTERN=template
>>>> + cut -d= -f1
>>>> + echo name=r-4-VM
>>>> + FIRSTPATTERN=name
>>>> + cut -d= -f1
>>>> + echo eth0ip=10.70.116.75
>>>> + FIRSTPATTERN=eth0ip
>>>> + cut -d= -f1
>>>> + echo eth0mask=255.255.255.0
>>>> + FIRSTPATTERN=eth0mask
>>>> + cut -d= -f1
>>>> + echo gateway=10.70.116.1
>>>> + FIRSTPATTERN=gateway
>>>> + cut -d= -f1
>>>> + echo domain=vit.vertitechit.com
>>>> + FIRSTPATTERN=domain
>>>> + cut -d= -f1
>>>> + echo cidrsize=24
>>>> + FIRSTPATTERN=cidrsize
>>>> + cut -d= -f1
>>>> + echo dhcprange=10.70.116.1
>>>> + FIRSTPATTERN=dhcprange
>>>> + cut -d= -f1
>>>> + echo eth1ip=0.0.0.0
>>>> + FIRSTPATTERN=eth1ip
>>>> + cut -d= -f1
>>>> + echo eth1mask=0.0.0.0
>>>> + FIRSTPATTERN=eth1mask
>>>> + cut -d= -f1
>>>> + echo mgmtcidr=10.70.110.0/24
>>>> + FIRSTPATTERN=mgmtcidr
>>>> + cut -d= -f1
>>>> + echo localgw=10.70.116.1
>>>> + FIRSTPATTERN=localgw
>>>> + cut -d= -f1
>>>> + echo sshonguest=true
>>>> + FIRSTPATTERN=sshonguest
>>>> + cut -d= -f1
>>>> + echo type=dhcpsrvr
>>>> + FIRSTPATTERN=type
>>>> + cut -d= -f2
>>>> + echo type=dhcpsrvr
>>>> + TYPE=dhcpsrvr
>>>> + cut -d= -f1
>>>> + echo disable_rp_filter=true
>>>> + FIRSTPATTERN=disable_rp_filter
>>>> + cut -d= -f1
>>>> + echo extra_pubnics=2
>>>> + FIRSTPATTERN=extra_pubnics
>>>> + cut -d= -f1
>>>> + echo dns1=10.70.10.21
>>>> + FIRSTPATTERN=dns1
>>>> + cut -d= -f1
>>>> + echo
>>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldO
>>> vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>>> + FIRSTPATTERN=baremetalnotificationsecuritykey
>>>> + cut -d= -f1
>>>> + echo
>>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37
>>> aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>>> + FIRSTPATTERN=baremetalnotificationapikey
>>>> + cut -d= -f1
>>>> + echo host=10.70.110.101
>>>> + FIRSTPATTERN=host
>>>> + cut -d= -f1
>>>> + echo port=8080
>>>> + FIRSTPATTERN=port
>>>> + cut -d= -f1
>>>> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>> + FIRSTPATTERN=nic_macs
>>>> + [ -f /etc/init.d/functions ]
>>>> + [ -f ./lib/lsb/init-functions ]
>>>> + RETVAL=0
>>>> + CLOUDSTACK_HOME=/usr/local/cloud
>>>> + [ -f /usr/local/cloud/systemvm/utils.sh ] _failure [ -f 
>>>> + /etc/init.d/functions ] echo Failed
>>>> Failed
>>>> + [ 0 != 0 ]
>>>> + exit 0
>>>>
>>>> Thoughts?
>>>>
>>>> Jacob Seeley
>>>> Sr. Infrastructure Engineer
>>>> VertitechIT
>>>> 413-268-1631
>>>>
>>>> www.vertitechit.com
>>>>
>>>> -----Original Message-----
>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>> Sent: Wednesday, July 27, 2016 8:43 PM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> Hi Jacob
>>>>
>>>> I gave this a second read - if your issue is Router VM in starting 
>>>> mode
>>>> - but not started - it means cloudstack agent on routerVM cannot 
>>>> talk to
>>> management server on 8250 over POD network.
>>>>
>>>> Another reason would be an issue of hypervisor accessing the NFS 
>>>> mount
>>> used for secondary storage.
>>>>
>>>> Use console of vCenter to see what is happening on router vm. You 
>>>> can
>>> login locally with root/password and see the content of 
>>> /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you...
>>>>
>>>> you can also run /etc/init.d/cloud stop and start.. that will give 
>>>> you a
>>> fresh start on logs..
>>>>
>>>> also, confirm that management server can talk to VR on POD IP
>>>> (management) on port 3922..
>>>>
>>>> Regards
>>>> ilya
>>>>
>>>> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>>>>> ilya,
>>>>>
>>>>> Here are the contents of the secondary storage:
>>>>>
>>>>> .
>>>>> ./template
>>>>> ./template/tmpl
>>>>> ./template/tmpl/1
>>>>> ./template/tmpl/1/8
>>>>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>>>>> ./template/tmpl/1/8/template.properties
>>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-
>>>>> vmw
>>>>> are.ovf
>>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-
>>>>> vmw
>>>>> are-disk3.vmdk
>>>>> ./template/tmpl/1/7
>>>>> ./template/tmpl/1/7/template.properties
>>>>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>>>>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>>>>> ./systemvm
>>>>> ./systemvm/systemvm-4.8.0.1.iso
>>>>> ./systemvm/.lck-bf162a0100000000
>>>>> ./snapshots
>>>>> ./volumes
>>>>>
>>>>> I've noticed that both the Secondary Storage VM and Console Proxy 
>>>>> VM
>>> mount this ISO and as stated before, they come up just fine.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jacob Seeley
>>>>> Sr. Infrastructure Engineer
>>>>> VertitechIT
>>>>> 413-268-1631
>>>>>
>>>>> www.vertitechit.com
>>>>>
>>>>> -----Original Message-----
>>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>>> Sent: Wednesday, July 27, 2016 3:22 AM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> Jacob
>>>>>
>>>>> The upgrade usually occurs though systemvm.iso - that is generated 
>>>>> by
>>> cloudstack on the first start.
>>>>>
>>>>> Please show the content of your secondary store specifically
>>>>>
>>>>> /mnt/[secondary-storage]/systemvm
>>>>>
>>>>> Regards
>>>>> ilya
>>>>>
>>>>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>>>>> Here is a pastebin snippet the management-server.log - 
>>>>>> http://pastebin.com/GCLm53Gz
>>>>>>
>>>>>> Hopefully the relevant data is in there.
>>>>>>
>>>>>> I made sure to start from scratch for this example. Everything 
>>>>>> from
>>> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack 
>>> install is fresh. I deployed a new instance in CloudStack, a VM 
>>> internally named i-2-3-VM with an IP address of 192.168.0.78. This 
>>> prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Jacob Seeley
>>>>>> Sr. Infrastructure Engineer
>>>>>> VertitechIT
>>>>>> 413-268-1631
>>>>>>
>>>>>> www.vertitechit.com
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>>>>> Sent: Monday, July 25, 2016 1:37 AM
>>>>>> To: users@cloudstack.apache.org
>>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>
>>>>>> please upload the logs in the issue.
>>>>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com>
>>> wrote:
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>>>>
>>>>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> What template are you using to start your first VM? - the 
>>>>>>>> default vmware template?
>>>>>>>> If you look in vcenter , what does the console show you ?
>>>>>>>>
>>>>>>>>
>>>>>>>> Glenn
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> glenn.wagner@shapeblue.com
>>>>>>>> www.shapeblue.com
>>>>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape 
>>>>>>>> Town 7130South Africa @shapeblue
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>>>>> To: users@cloudstack.apache.org
>>>>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>>>
>>>>>>>> hi,
>>>>>>>>
>>>>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>>>>
>>>>>>>> When trying to launch the first VM, the VS is created. VS starts 
>>>>>>>> up, but in CS, it stuck with "starting" state.
>>>>>>>>
>>>>>>>> i can't find any usefull information in the logs.
>>>>>>>>
>>>>>>>> any hint?
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> DISCLAIMER
>>>>>> ==========
>>>>>> This e-mail may contain privileged and confidential information 
>>>>>> which
>>> is the property of Accelerite, a Persistent Systems business. It is 
>>> intended only for the use of the individual or entity to which it is 
>>> addressed. If you are not the intended recipient, you are not 
>>> authorized to read, retain, copy, print, distribute or use this 
>>> message. If you have received this communication in error, please 
>>> notify the sender and delete all copies of this message. Accelerite, 
>>> a Persistent Systems business does not accept any liability for virus infected mails.
>>>>>>
>>>
>>

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Jacob Seeley <js...@vertitechit.com>.
ilya,

I'm using a Basic zone. Here is the workflow I'm using with actual IP addresses. Any fields I left out you can assume I leave blank.

Add Zone

Zone Type: Basic
Name: ZONE1
IPv4 DNS: 10.70.116.20
Internal DNS 1: 10.70.116.20
Hypervisor: VMware
Network Offering: DefaultSharedNetworkOffering

Physical Network

Management: vSwitch Name: vSwitch0
Guest: vSwitch Name: vSwitch0

Pod name: POD1
Reserved system gateway: 10.70.116.1
Reserved system netmask: 255.255.255.0
Start Reserved system IP: 10.70.116.60
End Reserved system IP: 10.70.116.79

Guest Gateway: 10.70.116.1
Guest Netmask: 255.255.255.0
Guest start IP: 10.70.116.80
Guest end IP: 10.70.116.99

The rest is Storage and is probably irrelevant here.

After I go through the wizard of adding a zone, it asks me to enable it, which I do. Without any further action, 2 System VMs (Console and Secondary) are created. The default CentOS template is downloaded. Both System VMs receive 2x IP addresses, one on the Pod network and one on the guest network.

System Storage VM
Public IP address: 10.70.116.81
Private IP address: 10.70.116.73
Gateway: 10.70.116.1

Console Proxy VM
Public IP address: 10.70.116.80
Private IP address: 10.70.116.74
Gateway: 10.70.116.1

Only when I initiate my first VM from template (or even ISO) is a Virtual Router deployed. Like mentioned before, it gets two NICS with the first one being of Traffic Type Guest and an IP address of 10.70.116.92 and a second NIC of Traffic Type Control and no IP address assigned (it reports 0.0.0.0). Ultimately the virtual router gets deployed on the hypervisor (VMware) but it's useless. The instance I tried deploying ultimately 
fails. I suspect that this is the problem or a problem. The virtual router gets an IP address on the guest network but not the management network.

Regarding the cloud agent/service (/etc/init.d/cloud) on the virtual router. I mentioned earlier that I found that /etc/init.d/cloud on the virtual router fails. I found this happens because /usr/local/cloud/systemvm never gets populated on the virtual router. Further down the rabbit hole I go, I see there is a script, /opt/cloud/bin/patchsystemvm.sh, that is responsible for mounting the systemvm.iso and unzipping the contents to /usr/local/cloud/systemvm. Both System VMS (console and secondary) do this but not the virtual router. From what I can tell, the reason for this as follows. If you look at the script (found here: https://github.com/apache/cloudstack/blob/master/systemvm/patches/debian/config/opt/cloud/bin/patchsystemvm.sh)

There is a function of the script called patch_console_proxy. This function gets called only if the following is satisfied: if [ "$TYPE" == "consoleproxy" ] || [ "$TYPE" == "secstorage" ] && [ -f ${PATCH_MOUNT}/systemvm.zip ]

I've noticed that the value for TYPE in every case I've tried this with the virtual router is equal to dhcpsrv. According to that script, the function that gets called for TYPE=dhcpdsrv is dhcpsrvr_svcs. That function does the following:

dhcpsrvr_svcs() {
   chkconfig cloud off
   chkconfig cloud-passwd-srvr on ; 
   chkconfig haproxy off ; 
   chkconfig dnsmasq on
   chkconfig ssh on
   chkconfig nfs-common off
   chkconfig portmap off
   chkconfig keepalived off
   chkconfig conntrackd off
   echo "ssh dnsmasq cloud-passwd-srvr apache2" > /var/cache/cloud/enabled_svcs
   echo "cloud nfs-common haproxy portmap" > /var/cache/cloud/disabled_svcs
} 

Here you can see that it turns off the cloud service. As far as I can tell, my system router is executing this function, so this is expected behavior. This tells me that the service cloud is to never run when TYPE=dhcpsrv.

As you mentioned before, I've since tried manually assigning an IP address NIC1 on the virtual router but it doesn't seem to help or do anything.

Without ever having a working setup of CloudStack before, it makes it harder for me to debug the issue since I'm not always sure how something should work. Unfortunately, I have no choice but to try and make this work on the VMware hypervisor. I think what I'm going to do now is setup a KVM hypervisor and see if I can get this working using the same software versions and workflow.

Thank you for your help.

Jacob Seeley
Sr. Infrastructure Engineer
VertitechIT
413-268-1631

www.vertitechit.com

-----Original Message-----
From: ilya [mailto:ilya.mailing.lists@gmail.com] 
Sent: Friday, July 29, 2016 2:43 AM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

Daren

I'm also running 4.5.2 - and like the stability we get with it.

For the features we need, 4.5.2 - has everything that is required, so I dont see huge benefit of upgrading to latest ACS ATM. Also, our environments are very large and complex - so upgrade is not something I can take lightly.

With that said, i do have a small 8 node Lab environment i can try the upgrade on, it consists of 4 ESXi and 4 KVM nodes - so it should be a fair test.

Lets wait for Jacob to respond with his test of setting up IP/Netmask for eth1 router vm, if it does not help, i'll try to upgrade to see if i can reproduce the issue.

Regards
ilya

On 7/28/16 9:43 PM, Darren Tang wrote:
> Hi ilya:
>  I can confirm that issus,  please check :
> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>  When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in 
> basic zone,  The VR is nerver leaves the "starting" state.  fell back 
> to 4.5 is fine.
>  Maybe you can test it by yourself.
> 
> 2016-07-29 3:24 GMT+08:00 ilya <il...@gmail.com>:
> 
>> I guess it would help to know what type of zone you use?
>>
>> Is it advanced, isolated vpc or shared network? what type of isolation?
>> or perhaps basic zone?
>>
>> Lastly, try stopping the iptables and restarting cloud agent (via 
>> stop and start)
>>
>> Please see my response in-line
>>
>> On 7/28/16 6:58 AM, Jacob Seeley wrote:
>>> Hi ilya,
>>>
>>> Funny you brought up debugging the router VM. After I responding
>> yesterday, I did just that and I did find some odd things.
>>> Just to be clear (I think we're on the same page), since I'm not the 
>>> OP
>> of this thread, the virtual router always gets deployed and it starts 
>> up just fine; however, CloudStack reports that it's always stuck in starting.
>> VMs that get deployed ultimately fail. CloudStack reports the router 
>> version as UNKNOWN.
>>> Before I provide what I found debugging the router VM, I'll address 
>>> some
>> of your points.
>>>
>>> ### FOLLOW-UP QUESTIONS ###
>>>
>>> " Another reason would be an issue of hypervisor accessing the NFS 
>>> mount
>> used for secondary storage."
>>> I don't believe this is an issue. The hypervisor (VMware) does mount 
>>> the
>> secondary storage via NFS just fine. If this were an issue, I would 
>> think the Secondary Storage and Console VMs would not deploy.
>>>
>>> " Use console of vCenter to see what is happening on router vm. You 
>>> can
>> login locally with root/password and see the content of 
>> /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you..."
>>> It looks like to me that /var/log/cloud.out is only logged to when
>> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
>> As such, there isn't even a file for /var/log/cloud.out. Even when I 
>> set that variable, I never get anything logged to /var/log/cloud.out. 
>> However, there is a /var/log/cloud.log. Here is the contents of that:
>> http://pastebin.com/aaTsRKZE
>>>
>>> " you can also run /etc/init.d/cloud stop and start.. that will give 
>>> you
>> a fresh start on logs.."
>>> The service is in a failed state. It's worth noting that this 
>>> service is
>> in a started state on the Console and Secondary Storage VMs.
>>
>> this is concerning - see you did "sh -x", read on..
>>
>>>
>>> " also, confirm that management server can talk to VR on POD IP
>>> (management) on port 3922.."
>>> It appears this is not an issue; see below:
>>
>> 3922 from MS to VR - this is the SSH daemon on VR with private key
>> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>>
>>
>>>
>>> root@r-4-VM:~# telnet 10.70.110.101 8250 Trying 10.70.110.101...
>>> Connected to 10.70.110.101.
>>> Escape character is '^]'.
>>>
>>
>>
>>> ### ROUTE VM DEBUG ###
>>>
>>> Here is what I found with router VM gets deployed (please tell me if
>> anything seems off):
>>> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an 
>>> IP
>> address coming from the defaultGuestNetwork. NIC2 is traffic type 
>> Control but has an IP address of 0.0.0.0
>>
>> It is an issue for concern to see 0.0.0.0 assigned to eth1
>>
>> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>>
>> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. 
>> This IP should be coming from the POD network range -> when you added 
>> a pod - i assume you did it as part of Add Zone wizard...
>>
>> To see the PODIP range, goto UI
>> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 
>> 1 (assume you did not create anything special), Management, IP Ranges 
>> -> you should see a range defined there and it should not be 0.0.0.0...
>>
>>> From the CloudStack management server, I cannot SSH into the router 
>>> VM
>> on NIC1. I've found this is because of iptables rules on the router 
>> VM. If I issue a /etc/init.d/iptables-persistent flush on the router 
>> VM, I can SSH into the router VM using the SSH key at port 3922.
>>> The service "cloud" is in a failed state. Looking at the cloud init
>> script, I see the following:
>>>
>>> CMDLINE=$(cat /var/cache/cloud/cmdline)
>>>
>>> TYPE="router"
>>> for i in $CMDLINE
>>>   do
>>>     # search for foo=bar pattern and cut out foo
>>>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>>>     case $FIRSTPATTERN in
>>>       type)
>>>           TYPE=$(echo $i | cut -d= -f2)
>>>       ;;
>>>     esac
>>> done
>>>
>>> The file cat /var/cache/cloud/cmdline exist; here are the contents:
>>>
>>> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0
>> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
>> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr 
>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldO
>> vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37
>> aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>> host=10.70.110.101 port=8080 
>> nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>
>>
>>
>> You can also try updating your  /var/cache/cloud/cmdline with proper 
>> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under 
>> Infrastructure, Routers, r-4, Nics and look for control nic..
>>
>> Then try starting the cloud service..
>>
>> Also, did you enable baremetal support? can you deploy a zone without 
>> baremetal support? Perhaps there is a bug on how IPs are assigned to
>> eth1 (control nic)...
>>
>>
>>> The previous code suggests that the value of TYPE starts as router 
>>> but
>> will get set to dhcpsrvr, as indicated by the contents of 
>> /var/cache/cloud/cmdline. Is this normal?
>>> Further down the script, I see:
>>>
>>> CLOUDSTACK_HOME="/usr/local/cloud"
>> <----------------------------------------Exists
>>> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
>> <----------------------------------------Does not exist. Seems odd!
>>> then
>>>   . $CLOUDSTACK_HOME/systemvm/utils.sh
>>> else
>>>   _failure
>>> fi
>>>
>>> # mkdir -p /var/log/vmops
>>>
>>> start() {
>>>    local pid=$(get_pids)
>>>    if [ "$pid" != "" ]; then
>>>        echo "CloudStack cloud sevice is already running, PID = $pid"
>>>        return 0
>>>    fi
>>>
>>>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>>>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
>> <------------------------------------------------------Does not exist.
>> Seems odd!
>>>    then
>>>      if [ "$pid" == "" ]
>>>      then
>>>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
>>>        pid=$(get_pids)
>>>        echo $pid > /var/run/cloud.pid
>>>      fi
>>>      _success
>>>    else
>>>      _failure
>>>    fi
>>>    echo
>>>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
>>> }
>>>
>>> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
>> exists; however, the script then looks for the file 
>> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also 
>> looks is supposed to start the script run.sh but that also doesn't 
>> exist. This seems like a problem to me.
>>> Here you can see step through when I try to start the cloud service:
>>>
>>> sh -x /etc/init.d/cloud start
>>> + ENABLED=0
>>> + [ -e /etc/default/cloud ]
>>> + . /etc/default/cloud
>>> + ENABLED=0
>>> + cat /var/cache/cloud/cmdline
>>> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
>> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com
>> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 
>> mgmtcidr=
>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr 
>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldO
>> vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37
>> aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>> host=10.70.110.101 port=8080 
>> nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>> + [ ! -z ]
>>> + LOG_FILE=/dev/null
>>> + TYPE=router
>>> + cut -d= -f1
>>> + echo template=domP
>>> + FIRSTPATTERN=template
>>> + cut -d= -f1
>>> + echo name=r-4-VM
>>> + FIRSTPATTERN=name
>>> + cut -d= -f1
>>> + echo eth0ip=10.70.116.75
>>> + FIRSTPATTERN=eth0ip
>>> + cut -d= -f1
>>> + echo eth0mask=255.255.255.0
>>> + FIRSTPATTERN=eth0mask
>>> + cut -d= -f1
>>> + echo gateway=10.70.116.1
>>> + FIRSTPATTERN=gateway
>>> + cut -d= -f1
>>> + echo domain=vit.vertitechit.com
>>> + FIRSTPATTERN=domain
>>> + cut -d= -f1
>>> + echo cidrsize=24
>>> + FIRSTPATTERN=cidrsize
>>> + cut -d= -f1
>>> + echo dhcprange=10.70.116.1
>>> + FIRSTPATTERN=dhcprange
>>> + cut -d= -f1
>>> + echo eth1ip=0.0.0.0
>>> + FIRSTPATTERN=eth1ip
>>> + cut -d= -f1
>>> + echo eth1mask=0.0.0.0
>>> + FIRSTPATTERN=eth1mask
>>> + cut -d= -f1
>>> + echo mgmtcidr=10.70.110.0/24
>>> + FIRSTPATTERN=mgmtcidr
>>> + cut -d= -f1
>>> + echo localgw=10.70.116.1
>>> + FIRSTPATTERN=localgw
>>> + cut -d= -f1
>>> + echo sshonguest=true
>>> + FIRSTPATTERN=sshonguest
>>> + cut -d= -f1
>>> + echo type=dhcpsrvr
>>> + FIRSTPATTERN=type
>>> + cut -d= -f2
>>> + echo type=dhcpsrvr
>>> + TYPE=dhcpsrvr
>>> + cut -d= -f1
>>> + echo disable_rp_filter=true
>>> + FIRSTPATTERN=disable_rp_filter
>>> + cut -d= -f1
>>> + echo extra_pubnics=2
>>> + FIRSTPATTERN=extra_pubnics
>>> + cut -d= -f1
>>> + echo dns1=10.70.10.21
>>> + FIRSTPATTERN=dns1
>>> + cut -d= -f1
>>> + echo
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldO
>> vhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> + FIRSTPATTERN=baremetalnotificationsecuritykey
>>> + cut -d= -f1
>>> + echo
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37
>> aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> + FIRSTPATTERN=baremetalnotificationapikey
>>> + cut -d= -f1
>>> + echo host=10.70.110.101
>>> + FIRSTPATTERN=host
>>> + cut -d= -f1
>>> + echo port=8080
>>> + FIRSTPATTERN=port
>>> + cut -d= -f1
>>> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>> + FIRSTPATTERN=nic_macs
>>> + [ -f /etc/init.d/functions ]
>>> + [ -f ./lib/lsb/init-functions ]
>>> + RETVAL=0
>>> + CLOUDSTACK_HOME=/usr/local/cloud
>>> + [ -f /usr/local/cloud/systemvm/utils.sh ] _failure [ -f 
>>> + /etc/init.d/functions ] echo Failed
>>> Failed
>>> + [ 0 != 0 ]
>>> + exit 0
>>>
>>> Thoughts?
>>>
>>> Jacob Seeley
>>> Sr. Infrastructure Engineer
>>> VertitechIT
>>> 413-268-1631
>>>
>>> www.vertitechit.com
>>>
>>> -----Original Message-----
>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>> Sent: Wednesday, July 27, 2016 8:43 PM
>>> To: users@cloudstack.apache.org
>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> Hi Jacob
>>>
>>> I gave this a second read - if your issue is Router VM in starting 
>>> mode
>>> - but not started - it means cloudstack agent on routerVM cannot 
>>> talk to
>> management server on 8250 over POD network.
>>>
>>> Another reason would be an issue of hypervisor accessing the NFS 
>>> mount
>> used for secondary storage.
>>>
>>> Use console of vCenter to see what is happening on router vm. You 
>>> can
>> login locally with root/password and see the content of 
>> /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you...
>>>
>>> you can also run /etc/init.d/cloud stop and start.. that will give 
>>> you a
>> fresh start on logs..
>>>
>>> also, confirm that management server can talk to VR on POD IP
>>> (management) on port 3922..
>>>
>>> Regards
>>> ilya
>>>
>>> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>>>> ilya,
>>>>
>>>> Here are the contents of the secondary storage:
>>>>
>>>> .
>>>> ./template
>>>> ./template/tmpl
>>>> ./template/tmpl/1
>>>> ./template/tmpl/1/8
>>>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>>>> ./template/tmpl/1/8/template.properties
>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-
>>>> vmw
>>>> are.ovf
>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-
>>>> vmw
>>>> are-disk3.vmdk
>>>> ./template/tmpl/1/7
>>>> ./template/tmpl/1/7/template.properties
>>>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>>>> ./systemvm
>>>> ./systemvm/systemvm-4.8.0.1.iso
>>>> ./systemvm/.lck-bf162a0100000000
>>>> ./snapshots
>>>> ./volumes
>>>>
>>>> I've noticed that both the Secondary Storage VM and Console Proxy 
>>>> VM
>> mount this ISO and as stated before, they come up just fine.
>>>>
>>>> Regards,
>>>>
>>>> Jacob Seeley
>>>> Sr. Infrastructure Engineer
>>>> VertitechIT
>>>> 413-268-1631
>>>>
>>>> www.vertitechit.com
>>>>
>>>> -----Original Message-----
>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>> Sent: Wednesday, July 27, 2016 3:22 AM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> Jacob
>>>>
>>>> The upgrade usually occurs though systemvm.iso - that is generated 
>>>> by
>> cloudstack on the first start.
>>>>
>>>> Please show the content of your secondary store specifically
>>>>
>>>> /mnt/[secondary-storage]/systemvm
>>>>
>>>> Regards
>>>> ilya
>>>>
>>>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>>>> Here is a pastebin snippet the management-server.log - 
>>>>> http://pastebin.com/GCLm53Gz
>>>>>
>>>>> Hopefully the relevant data is in there.
>>>>>
>>>>> I made sure to start from scratch for this example. Everything 
>>>>> from
>> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack 
>> install is fresh. I deployed a new instance in CloudStack, a VM 
>> internally named i-2-3-VM with an IP address of 192.168.0.78. This 
>> prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Jacob Seeley
>>>>> Sr. Infrastructure Engineer
>>>>> VertitechIT
>>>>> 413-268-1631
>>>>>
>>>>> www.vertitechit.com
>>>>>
>>>>> -----Original Message-----
>>>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>>>> Sent: Monday, July 25, 2016 1:37 AM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> please upload the logs in the issue.
>>>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com>
>> wrote:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>>>
>>>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> What template are you using to start your first VM? - the 
>>>>>>> default vmware template?
>>>>>>> If you look in vcenter , what does the console show you ?
>>>>>>>
>>>>>>>
>>>>>>> Glenn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> glenn.wagner@shapeblue.com
>>>>>>> www.shapeblue.com
>>>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape 
>>>>>>> Town 7130South Africa @shapeblue
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>>>> To: users@cloudstack.apache.org
>>>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>>
>>>>>>> hi,
>>>>>>>
>>>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>>>
>>>>>>> When trying to launch the first VM, the VS is created. VS starts 
>>>>>>> up, but in CS, it stuck with "starting" state.
>>>>>>>
>>>>>>> i can't find any usefull information in the logs.
>>>>>>>
>>>>>>> any hint?
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> DISCLAIMER
>>>>> ==========
>>>>> This e-mail may contain privileged and confidential information 
>>>>> which
>> is the property of Accelerite, a Persistent Systems business. It is 
>> intended only for the use of the individual or entity to which it is 
>> addressed. If you are not the intended recipient, you are not 
>> authorized to read, retain, copy, print, distribute or use this 
>> message. If you have received this communication in error, please 
>> notify the sender and delete all copies of this message. Accelerite, 
>> a Persistent Systems business does not accept any liability for virus infected mails.
>>>>>
>>
> 

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by ilya <il...@gmail.com>.
Daren

I'm also running 4.5.2 - and like the stability we get with it.

For the features we need, 4.5.2 - has everything that is required, so I
dont see huge benefit of upgrading to latest ACS ATM. Also, our
environments are very large and complex - so upgrade is not something I
can take lightly.

With that said, i do have a small 8 node Lab environment i can try the
upgrade on, it consists of 4 ESXi and 4 KVM nodes - so it should be a
fair test.

Lets wait for Jacob to respond with his test of setting up IP/Netmask
for eth1 router vm, if it does not help, i'll try to upgrade to see if i
can reproduce the issue.

Regards
ilya

On 7/28/16 9:43 PM, Darren Tang wrote:
> Hi ilya:
>  I can confirm that issus,  please check :
> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>  When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in basic
> zone,  The VR is nerver leaves the "starting" state.  fell back to 4.5 is
> fine.
>  Maybe you can test it by yourself.
> 
> 2016-07-29 3:24 GMT+08:00 ilya <il...@gmail.com>:
> 
>> I guess it would help to know what type of zone you use?
>>
>> Is it advanced, isolated vpc or shared network? what type of isolation?
>> or perhaps basic zone?
>>
>> Lastly, try stopping the iptables and restarting cloud agent (via stop
>> and start)
>>
>> Please see my response in-line
>>
>> On 7/28/16 6:58 AM, Jacob Seeley wrote:
>>> Hi ilya,
>>>
>>> Funny you brought up debugging the router VM. After I responding
>> yesterday, I did just that and I did find some odd things.
>>> Just to be clear (I think we're on the same page), since I'm not the OP
>> of this thread, the virtual router always gets deployed and it starts up
>> just fine; however, CloudStack reports that it's always stuck in starting.
>> VMs that get deployed ultimately fail. CloudStack reports the router
>> version as UNKNOWN.
>>> Before I provide what I found debugging the router VM, I'll address some
>> of your points.
>>>
>>> ### FOLLOW-UP QUESTIONS ###
>>>
>>> " Another reason would be an issue of hypervisor accessing the NFS mount
>> used for secondary storage."
>>> I don't believe this is an issue. The hypervisor (VMware) does mount the
>> secondary storage via NFS just fine. If this were an issue, I would think
>> the Secondary Storage and Console VMs would not deploy.
>>>
>>> " Use console of vCenter to see what is happening on router vm. You can
>> login locally with root/password and see the content of /var/log/cloud.out
>> file, paste it on pastebin - if it makes no sense to you..."
>>> It looks like to me that /var/log/cloud.out is only logged to when
>> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
>> As such, there isn't even a file for /var/log/cloud.out. Even when I set
>> that variable, I never get anything logged to /var/log/cloud.out. However,
>> there is a /var/log/cloud.log. Here is the contents of that:
>> http://pastebin.com/aaTsRKZE
>>>
>>> " you can also run /etc/init.d/cloud stop and start.. that will give you
>> a fresh start on logs.."
>>> The service is in a failed state. It's worth noting that this service is
>> in a started state on the Console and Secondary Storage VMs.
>>
>> this is concerning - see you did "sh -x", read on..
>>
>>>
>>> " also, confirm that management server can talk to VR on POD IP
>>> (management) on port 3922.."
>>> It appears this is not an issue; see below:
>>
>> 3922 from MS to VR - this is the SSH daemon on VR with private key
>> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>>
>>
>>>
>>> root@r-4-VM:~# telnet 10.70.110.101 8250
>>> Trying 10.70.110.101...
>>> Connected to 10.70.110.101.
>>> Escape character is '^]'.
>>>
>>
>>
>>> ### ROUTE VM DEBUG ###
>>>
>>> Here is what I found with router VM gets deployed (please tell me if
>> anything seems off):
>>> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP
>> address coming from the defaultGuestNetwork. NIC2 is traffic type Control
>> but has an IP address of 0.0.0.0
>>
>> It is an issue for concern to see 0.0.0.0 assigned to eth1
>>
>> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>>
>> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. This
>> IP should be coming from the POD network range -> when you added a pod -
>> i assume you did it as part of Add Zone wizard...
>>
>> To see the PODIP range, goto UI
>> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1
>> (assume you did not create anything special), Management, IP Ranges ->
>> you should see a range defined there and it should not be 0.0.0.0...
>>
>>> From the CloudStack management server, I cannot SSH into the router VM
>> on NIC1. I've found this is because of iptables rules on the router VM. If
>> I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH
>> into the router VM using the SSH key at port 3922.
>>> The service "cloud" is in a failed state. Looking at the cloud init
>> script, I see the following:
>>>
>>> CMDLINE=$(cat /var/cache/cloud/cmdline)
>>>
>>> TYPE="router"
>>> for i in $CMDLINE
>>>   do
>>>     # search for foo=bar pattern and cut out foo
>>>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>>>     case $FIRSTPATTERN in
>>>       type)
>>>           TYPE=$(echo $i | cut -d= -f2)
>>>       ;;
>>>     esac
>>> done
>>>
>>> The file cat /var/cache/cloud/cmdline exist; here are the contents:
>>>
>>> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0
>> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
>> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>>
>>
>>
>> You can also try updating your  /var/cache/cloud/cmdline with proper
>> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under
>> Infrastructure, Routers, r-4, Nics and look for control nic..
>>
>> Then try starting the cloud service..
>>
>> Also, did you enable baremetal support? can you deploy a zone without
>> baremetal support? Perhaps there is a bug on how IPs are assigned to
>> eth1 (control nic)...
>>
>>
>>> The previous code suggests that the value of TYPE starts as router but
>> will get set to dhcpsrvr, as indicated by the contents of
>> /var/cache/cloud/cmdline. Is this normal?
>>> Further down the script, I see:
>>>
>>> CLOUDSTACK_HOME="/usr/local/cloud"
>> <----------------------------------------Exists
>>> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
>> <----------------------------------------Does not exist. Seems odd!
>>> then
>>>   . $CLOUDSTACK_HOME/systemvm/utils.sh
>>> else
>>>   _failure
>>> fi
>>>
>>> # mkdir -p /var/log/vmops
>>>
>>> start() {
>>>    local pid=$(get_pids)
>>>    if [ "$pid" != "" ]; then
>>>        echo "CloudStack cloud sevice is already running, PID = $pid"
>>>        return 0
>>>    fi
>>>
>>>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>>>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
>> <------------------------------------------------------Does not exist.
>> Seems odd!
>>>    then
>>>      if [ "$pid" == "" ]
>>>      then
>>>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
>>>        pid=$(get_pids)
>>>        echo $pid > /var/run/cloud.pid
>>>      fi
>>>      _success
>>>    else
>>>      _failure
>>>    fi
>>>    echo
>>>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
>>> }
>>>
>>> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
>> exists; however, the script then looks for the file
>> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks
>> is supposed to start the script run.sh but that also doesn't exist. This
>> seems like a problem to me.
>>> Here you can see step through when I try to start the cloud service:
>>>
>>> sh -x /etc/init.d/cloud start
>>> + ENABLED=0
>>> + [ -e /etc/default/cloud ]
>>> + . /etc/default/cloud
>>> + ENABLED=0
>>> + cat /var/cache/cloud/cmdline
>>> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
>> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com
>> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
>> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
>> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>> + [ ! -z ]
>>> + LOG_FILE=/dev/null
>>> + TYPE=router
>>> + cut -d= -f1
>>> + echo template=domP
>>> + FIRSTPATTERN=template
>>> + cut -d= -f1
>>> + echo name=r-4-VM
>>> + FIRSTPATTERN=name
>>> + cut -d= -f1
>>> + echo eth0ip=10.70.116.75
>>> + FIRSTPATTERN=eth0ip
>>> + cut -d= -f1
>>> + echo eth0mask=255.255.255.0
>>> + FIRSTPATTERN=eth0mask
>>> + cut -d= -f1
>>> + echo gateway=10.70.116.1
>>> + FIRSTPATTERN=gateway
>>> + cut -d= -f1
>>> + echo domain=vit.vertitechit.com
>>> + FIRSTPATTERN=domain
>>> + cut -d= -f1
>>> + echo cidrsize=24
>>> + FIRSTPATTERN=cidrsize
>>> + cut -d= -f1
>>> + echo dhcprange=10.70.116.1
>>> + FIRSTPATTERN=dhcprange
>>> + cut -d= -f1
>>> + echo eth1ip=0.0.0.0
>>> + FIRSTPATTERN=eth1ip
>>> + cut -d= -f1
>>> + echo eth1mask=0.0.0.0
>>> + FIRSTPATTERN=eth1mask
>>> + cut -d= -f1
>>> + echo mgmtcidr=10.70.110.0/24
>>> + FIRSTPATTERN=mgmtcidr
>>> + cut -d= -f1
>>> + echo localgw=10.70.116.1
>>> + FIRSTPATTERN=localgw
>>> + cut -d= -f1
>>> + echo sshonguest=true
>>> + FIRSTPATTERN=sshonguest
>>> + cut -d= -f1
>>> + echo type=dhcpsrvr
>>> + FIRSTPATTERN=type
>>> + cut -d= -f2
>>> + echo type=dhcpsrvr
>>> + TYPE=dhcpsrvr
>>> + cut -d= -f1
>>> + echo disable_rp_filter=true
>>> + FIRSTPATTERN=disable_rp_filter
>>> + cut -d= -f1
>>> + echo extra_pubnics=2
>>> + FIRSTPATTERN=extra_pubnics
>>> + cut -d= -f1
>>> + echo dns1=10.70.10.21
>>> + FIRSTPATTERN=dns1
>>> + cut -d= -f1
>>> + echo
>> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
>>> + FIRSTPATTERN=baremetalnotificationsecuritykey
>>> + cut -d= -f1
>>> + echo
>> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
>>> + FIRSTPATTERN=baremetalnotificationapikey
>>> + cut -d= -f1
>>> + echo host=10.70.110.101
>>> + FIRSTPATTERN=host
>>> + cut -d= -f1
>>> + echo port=8080
>>> + FIRSTPATTERN=port
>>> + cut -d= -f1
>>> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
>>> + FIRSTPATTERN=nic_macs
>>> + [ -f /etc/init.d/functions ]
>>> + [ -f ./lib/lsb/init-functions ]
>>> + RETVAL=0
>>> + CLOUDSTACK_HOME=/usr/local/cloud
>>> + [ -f /usr/local/cloud/systemvm/utils.sh ]
>>> + _failure
>>> + [ -f /etc/init.d/functions ]
>>> + echo Failed
>>> Failed
>>> + [ 0 != 0 ]
>>> + exit 0
>>>
>>> Thoughts?
>>>
>>> Jacob Seeley
>>> Sr. Infrastructure Engineer
>>> VertitechIT
>>> 413-268-1631
>>>
>>> www.vertitechit.com
>>>
>>> -----Original Message-----
>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>> Sent: Wednesday, July 27, 2016 8:43 PM
>>> To: users@cloudstack.apache.org
>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> Hi Jacob
>>>
>>> I gave this a second read - if your issue is Router VM in starting mode
>>> - but not started - it means cloudstack agent on routerVM cannot talk to
>> management server on 8250 over POD network.
>>>
>>> Another reason would be an issue of hypervisor accessing the NFS mount
>> used for secondary storage.
>>>
>>> Use console of vCenter to see what is happening on router vm. You can
>> login locally with root/password and see the content of /var/log/cloud.out
>> file, paste it on pastebin - if it makes no sense to you...
>>>
>>> you can also run /etc/init.d/cloud stop and start.. that will give you a
>> fresh start on logs..
>>>
>>> also, confirm that management server can talk to VR on POD IP
>>> (management) on port 3922..
>>>
>>> Regards
>>> ilya
>>>
>>> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>>>> ilya,
>>>>
>>>> Here are the contents of the secondary storage:
>>>>
>>>> .
>>>> ./template
>>>> ./template/tmpl
>>>> ./template/tmpl/1
>>>> ./template/tmpl/1/8
>>>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>>>> ./template/tmpl/1/8/template.properties
>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>>>> are.ovf
>>>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>>>> are-disk3.vmdk
>>>> ./template/tmpl/1/7
>>>> ./template/tmpl/1/7/template.properties
>>>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>>>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>>>> ./systemvm
>>>> ./systemvm/systemvm-4.8.0.1.iso
>>>> ./systemvm/.lck-bf162a0100000000
>>>> ./snapshots
>>>> ./volumes
>>>>
>>>> I've noticed that both the Secondary Storage VM and Console Proxy VM
>> mount this ISO and as stated before, they come up just fine.
>>>>
>>>> Regards,
>>>>
>>>> Jacob Seeley
>>>> Sr. Infrastructure Engineer
>>>> VertitechIT
>>>> 413-268-1631
>>>>
>>>> www.vertitechit.com
>>>>
>>>> -----Original Message-----
>>>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>>>> Sent: Wednesday, July 27, 2016 3:22 AM
>>>> To: users@cloudstack.apache.org
>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> Jacob
>>>>
>>>> The upgrade usually occurs though systemvm.iso - that is generated by
>> cloudstack on the first start.
>>>>
>>>> Please show the content of your secondary store specifically
>>>>
>>>> /mnt/[secondary-storage]/systemvm
>>>>
>>>> Regards
>>>> ilya
>>>>
>>>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>>>> Here is a pastebin snippet the management-server.log -
>>>>> http://pastebin.com/GCLm53Gz
>>>>>
>>>>> Hopefully the relevant data is in there.
>>>>>
>>>>> I made sure to start from scratch for this example. Everything from
>> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is
>> fresh. I deployed a new instance in CloudStack, a VM internally named
>> i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to
>> deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Jacob Seeley
>>>>> Sr. Infrastructure Engineer
>>>>> VertitechIT
>>>>> 413-268-1631
>>>>>
>>>>> www.vertitechit.com
>>>>>
>>>>> -----Original Message-----
>>>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>>>> Sent: Monday, July 25, 2016 1:37 AM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> please upload the logs in the issue.
>>>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com>
>> wrote:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>>>
>>>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> What template are you using to start your first VM? - the default
>>>>>>> vmware template?
>>>>>>> If you look in vcenter , what does the console show you ?
>>>>>>>
>>>>>>>
>>>>>>> Glenn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> glenn.wagner@shapeblue.com
>>>>>>> www.shapeblue.com
>>>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town
>>>>>>> 7130South Africa @shapeblue
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>>>> To: users@cloudstack.apache.org
>>>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>>>
>>>>>>> hi,
>>>>>>>
>>>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>>>
>>>>>>> When trying to launch the first VM, the VS is created. VS starts
>>>>>>> up, but in CS, it stuck with "starting" state.
>>>>>>>
>>>>>>> i can't find any usefull information in the logs.
>>>>>>>
>>>>>>> any hint?
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> DISCLAIMER
>>>>> ==========
>>>>> This e-mail may contain privileged and confidential information which
>> is the property of Accelerite, a Persistent Systems business. It is
>> intended only for the use of the individual or entity to which it is
>> addressed. If you are not the intended recipient, you are not authorized to
>> read, retain, copy, print, distribute or use this message. If you have
>> received this communication in error, please notify the sender and delete
>> all copies of this message. Accelerite, a Persistent Systems business does
>> not accept any liability for virus infected mails.
>>>>>
>>
> 

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Darren Tang <da...@gmail.com>.
Hi ilya:
 I can confirm that issus,  please check :
https://issues.apache.org/jira/browse/CLOUDSTACK-9144
 When we deployed cloudstack(4.6/4.7/4.8)  with vmware(5.x/6.0) in basic
zone,  The VR is nerver leaves the "starting" state.  fell back to 4.5 is
fine.
 Maybe you can test it by yourself.

2016-07-29 3:24 GMT+08:00 ilya <il...@gmail.com>:

> I guess it would help to know what type of zone you use?
>
> Is it advanced, isolated vpc or shared network? what type of isolation?
> or perhaps basic zone?
>
> Lastly, try stopping the iptables and restarting cloud agent (via stop
> and start)
>
> Please see my response in-line
>
> On 7/28/16 6:58 AM, Jacob Seeley wrote:
> > Hi ilya,
> >
> > Funny you brought up debugging the router VM. After I responding
> yesterday, I did just that and I did find some odd things.
> > Just to be clear (I think we're on the same page), since I'm not the OP
> of this thread, the virtual router always gets deployed and it starts up
> just fine; however, CloudStack reports that it's always stuck in starting.
> VMs that get deployed ultimately fail. CloudStack reports the router
> version as UNKNOWN.
> > Before I provide what I found debugging the router VM, I'll address some
> of your points.
> >
> > ### FOLLOW-UP QUESTIONS ###
> >
> > " Another reason would be an issue of hypervisor accessing the NFS mount
> used for secondary storage."
> > I don't believe this is an issue. The hypervisor (VMware) does mount the
> secondary storage via NFS just fine. If this were an issue, I would think
> the Secondary Storage and Console VMs would not deploy.
> >
> > " Use console of vCenter to see what is happening on router vm. You can
> login locally with root/password and see the content of /var/log/cloud.out
> file, paste it on pastebin - if it makes no sense to you..."
> > It looks like to me that /var/log/cloud.out is only logged to when
> $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script.
> As such, there isn't even a file for /var/log/cloud.out. Even when I set
> that variable, I never get anything logged to /var/log/cloud.out. However,
> there is a /var/log/cloud.log. Here is the contents of that:
> http://pastebin.com/aaTsRKZE
> >
> > " you can also run /etc/init.d/cloud stop and start.. that will give you
> a fresh start on logs.."
> > The service is in a failed state. It's worth noting that this service is
> in a started state on the Console and Secondary Storage VMs.
>
> this is concerning - see you did "sh -x", read on..
>
> >
> > " also, confirm that management server can talk to VR on POD IP
> > (management) on port 3922.."
> > It appears this is not an issue; see below:
>
> 3922 from MS to VR - this is the SSH daemon on VR with private key
> 8250 from VR to MS - cloudstack java agent on VR talking to MS
>
>
> >
> > root@r-4-VM:~# telnet 10.70.110.101 8250
> > Trying 10.70.110.101...
> > Connected to 10.70.110.101.
> > Escape character is '^]'.
> >
>
>
> > ### ROUTE VM DEBUG ###
> >
> > Here is what I found with router VM gets deployed (please tell me if
> anything seems off):
> > 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP
> address coming from the defaultGuestNetwork. NIC2 is traffic type Control
> but has an IP address of 0.0.0.0
>
> It is an issue for concern to see 0.0.0.0 assigned to eth1
>
> Lets assume NIC1 (as eth0) and NIC2 (as eth1).
>
> 1) we should not be getting 0.0.0.0 for eth1 - aka control network. This
> IP should be coming from the POD network range -> when you added a pod -
> i assume you did it as part of Add Zone wizard...
>
> To see the PODIP range, goto UI
> Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1
> (assume you did not create anything special), Management, IP Ranges ->
> you should see a range defined there and it should not be 0.0.0.0...
>
> > From the CloudStack management server, I cannot SSH into the router VM
> on NIC1. I've found this is because of iptables rules on the router VM. If
> I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH
> into the router VM using the SSH key at port 3922.
> > The service "cloud" is in a failed state. Looking at the cloud init
> script, I see the following:
> >
> > CMDLINE=$(cat /var/cache/cloud/cmdline)
> >
> > TYPE="router"
> > for i in $CMDLINE
> >   do
> >     # search for foo=bar pattern and cut out foo
> >     FIRSTPATTERN=$(echo $i | cut -d= -f1)
> >     case $FIRSTPATTERN in
> >       type)
> >           TYPE=$(echo $i | cut -d= -f2)
> >       ;;
> >     esac
> > done
> >
> > The file cat /var/cache/cloud/cmdline exist; here are the contents:
> >
> > template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0
> gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24
> dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> >
>
>
> You can also try updating your  /var/cache/cloud/cmdline with proper
> value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under
> Infrastructure, Routers, r-4, Nics and look for control nic..
>
> Then try starting the cloud service..
>
> Also, did you enable baremetal support? can you deploy a zone without
> baremetal support? Perhaps there is a bug on how IPs are assigned to
> eth1 (control nic)...
>
>
> > The previous code suggests that the value of TYPE starts as router but
> will get set to dhcpsrvr, as indicated by the contents of
> /var/cache/cloud/cmdline. Is this normal?
> > Further down the script, I see:
> >
> > CLOUDSTACK_HOME="/usr/local/cloud"
> <----------------------------------------Exists
> > if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ];
> <----------------------------------------Does not exist. Seems odd!
> > then
> >   . $CLOUDSTACK_HOME/systemvm/utils.sh
> > else
> >   _failure
> > fi
> >
> > # mkdir -p /var/log/vmops
> >
> > start() {
> >    local pid=$(get_pids)
> >    if [ "$pid" != "" ]; then
> >        echo "CloudStack cloud sevice is already running, PID = $pid"
> >        return 0
> >    fi
> >
> >    echo -n "Starting CloudStack cloud service (type=$TYPE) "
> >    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ];
> <------------------------------------------------------Does not exist.
> Seems odd!
> >    then
> >      if [ "$pid" == "" ]
> >      then
> >        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
> >        pid=$(get_pids)
> >        echo $pid > /var/run/cloud.pid
> >      fi
> >      _success
> >    else
> >      _failure
> >    fi
> >    echo
> >    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
> > }
> >
> > I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder
> exists; however, the script then looks for the file
> /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks
> is supposed to start the script run.sh but that also doesn't exist. This
> seems like a problem to me.
> > Here you can see step through when I try to start the cloud service:
> >
> > sh -x /etc/init.d/cloud start
> > + ENABLED=0
> > + [ -e /etc/default/cloud ]
> > + . /etc/default/cloud
> > + ENABLED=0
> > + cat /var/cache/cloud/cmdline
> > + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75
> eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com
> cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=
> 10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr
> disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21
> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> > + [ ! -z ]
> > + LOG_FILE=/dev/null
> > + TYPE=router
> > + cut -d= -f1
> > + echo template=domP
> > + FIRSTPATTERN=template
> > + cut -d= -f1
> > + echo name=r-4-VM
> > + FIRSTPATTERN=name
> > + cut -d= -f1
> > + echo eth0ip=10.70.116.75
> > + FIRSTPATTERN=eth0ip
> > + cut -d= -f1
> > + echo eth0mask=255.255.255.0
> > + FIRSTPATTERN=eth0mask
> > + cut -d= -f1
> > + echo gateway=10.70.116.1
> > + FIRSTPATTERN=gateway
> > + cut -d= -f1
> > + echo domain=vit.vertitechit.com
> > + FIRSTPATTERN=domain
> > + cut -d= -f1
> > + echo cidrsize=24
> > + FIRSTPATTERN=cidrsize
> > + cut -d= -f1
> > + echo dhcprange=10.70.116.1
> > + FIRSTPATTERN=dhcprange
> > + cut -d= -f1
> > + echo eth1ip=0.0.0.0
> > + FIRSTPATTERN=eth1ip
> > + cut -d= -f1
> > + echo eth1mask=0.0.0.0
> > + FIRSTPATTERN=eth1mask
> > + cut -d= -f1
> > + echo mgmtcidr=10.70.110.0/24
> > + FIRSTPATTERN=mgmtcidr
> > + cut -d= -f1
> > + echo localgw=10.70.116.1
> > + FIRSTPATTERN=localgw
> > + cut -d= -f1
> > + echo sshonguest=true
> > + FIRSTPATTERN=sshonguest
> > + cut -d= -f1
> > + echo type=dhcpsrvr
> > + FIRSTPATTERN=type
> > + cut -d= -f2
> > + echo type=dhcpsrvr
> > + TYPE=dhcpsrvr
> > + cut -d= -f1
> > + echo disable_rp_filter=true
> > + FIRSTPATTERN=disable_rp_filter
> > + cut -d= -f1
> > + echo extra_pubnics=2
> > + FIRSTPATTERN=extra_pubnics
> > + cut -d= -f1
> > + echo dns1=10.70.10.21
> > + FIRSTPATTERN=dns1
> > + cut -d= -f1
> > + echo
> baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> > + FIRSTPATTERN=baremetalnotificationsecuritykey
> > + cut -d= -f1
> > + echo
> baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> > + FIRSTPATTERN=baremetalnotificationapikey
> > + cut -d= -f1
> > + echo host=10.70.110.101
> > + FIRSTPATTERN=host
> > + cut -d= -f1
> > + echo port=8080
> > + FIRSTPATTERN=port
> > + cut -d= -f1
> > + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> > + FIRSTPATTERN=nic_macs
> > + [ -f /etc/init.d/functions ]
> > + [ -f ./lib/lsb/init-functions ]
> > + RETVAL=0
> > + CLOUDSTACK_HOME=/usr/local/cloud
> > + [ -f /usr/local/cloud/systemvm/utils.sh ]
> > + _failure
> > + [ -f /etc/init.d/functions ]
> > + echo Failed
> > Failed
> > + [ 0 != 0 ]
> > + exit 0
> >
> > Thoughts?
> >
> > Jacob Seeley
> > Sr. Infrastructure Engineer
> > VertitechIT
> > 413-268-1631
> >
> > www.vertitechit.com
> >
> > -----Original Message-----
> > From: ilya [mailto:ilya.mailing.lists@gmail.com]
> > Sent: Wednesday, July 27, 2016 8:43 PM
> > To: users@cloudstack.apache.org
> > Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> >
> > Hi Jacob
> >
> > I gave this a second read - if your issue is Router VM in starting mode
> > - but not started - it means cloudstack agent on routerVM cannot talk to
> management server on 8250 over POD network.
> >
> > Another reason would be an issue of hypervisor accessing the NFS mount
> used for secondary storage.
> >
> > Use console of vCenter to see what is happening on router vm. You can
> login locally with root/password and see the content of /var/log/cloud.out
> file, paste it on pastebin - if it makes no sense to you...
> >
> > you can also run /etc/init.d/cloud stop and start.. that will give you a
> fresh start on logs..
> >
> > also, confirm that management server can talk to VR on POD IP
> > (management) on port 3922..
> >
> > Regards
> > ilya
> >
> > On 7/27/16 9:34 AM, Jacob Seeley wrote:
> >> ilya,
> >>
> >> Here are the contents of the secondary storage:
> >>
> >> .
> >> ./template
> >> ./template/tmpl
> >> ./template/tmpl/1
> >> ./template/tmpl/1/8
> >> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
> >> ./template/tmpl/1/8/template.properties
> >> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
> >> are.ovf
> >> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
> >> are-disk3.vmdk
> >> ./template/tmpl/1/7
> >> ./template/tmpl/1/7/template.properties
> >> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
> >> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
> >> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
> >> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
> >> ./systemvm
> >> ./systemvm/systemvm-4.8.0.1.iso
> >> ./systemvm/.lck-bf162a0100000000
> >> ./snapshots
> >> ./volumes
> >>
> >> I've noticed that both the Secondary Storage VM and Console Proxy VM
> mount this ISO and as stated before, they come up just fine.
> >>
> >> Regards,
> >>
> >> Jacob Seeley
> >> Sr. Infrastructure Engineer
> >> VertitechIT
> >> 413-268-1631
> >>
> >> www.vertitechit.com
> >>
> >> -----Original Message-----
> >> From: ilya [mailto:ilya.mailing.lists@gmail.com]
> >> Sent: Wednesday, July 27, 2016 3:22 AM
> >> To: users@cloudstack.apache.org
> >> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> >>
> >> Jacob
> >>
> >> The upgrade usually occurs though systemvm.iso - that is generated by
> cloudstack on the first start.
> >>
> >> Please show the content of your secondary store specifically
> >>
> >> /mnt/[secondary-storage]/systemvm
> >>
> >> Regards
> >> ilya
> >>
> >> On 7/25/16 11:19 AM, Jacob Seeley wrote:
> >>> Here is a pastebin snippet the management-server.log -
> >>> http://pastebin.com/GCLm53Gz
> >>>
> >>> Hopefully the relevant data is in there.
> >>>
> >>> I made sure to start from scratch for this example. Everything from
> the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is
> fresh. I deployed a new instance in CloudStack, a VM internally named
> i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to
> deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
> >>>
> >>> Thank you,
> >>>
> >>> Jacob Seeley
> >>> Sr. Infrastructure Engineer
> >>> VertitechIT
> >>> 413-268-1631
> >>>
> >>> www.vertitechit.com
> >>>
> >>> -----Original Message-----
> >>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
> >>> Sent: Monday, July 25, 2016 1:37 AM
> >>> To: users@cloudstack.apache.org
> >>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> >>>
> >>> please upload the logs in the issue.
> >>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com>
> wrote:
> >>>>
> >>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
> >>>>
> >>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> What template are you using to start your first VM? - the default
> >>>>> vmware template?
> >>>>> If you look in vcenter , what does the console show you ?
> >>>>>
> >>>>>
> >>>>> Glenn
> >>>>>
> >>>>>
> >>>>>
> >>>>> glenn.wagner@shapeblue.com
> >>>>> www.shapeblue.com
> >>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town
> >>>>> 7130South Africa @shapeblue
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Pascal R. [mailto:repa182@gmail.com]
> >>>>> Sent: Monday, 04 July 2016 1:26 PM
> >>>>> To: users@cloudstack.apache.org
> >>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
> >>>>>
> >>>>> hi,
> >>>>>
> >>>>> we have a CS4.8 deployment with VMWare 5.5.
> >>>>>
> >>>>> When trying to launch the first VM, the VS is created. VS starts
> >>>>> up, but in CS, it stuck with "starting" state.
> >>>>>
> >>>>> i can't find any usefull information in the logs.
> >>>>>
> >>>>> any hint?
> >>>>>
> >>>
> >>>
> >>>
> >>>
> >>> DISCLAIMER
> >>> ==========
> >>> This e-mail may contain privileged and confidential information which
> is the property of Accelerite, a Persistent Systems business. It is
> intended only for the use of the individual or entity to which it is
> addressed. If you are not the intended recipient, you are not authorized to
> read, retain, copy, print, distribute or use this message. If you have
> received this communication in error, please notify the sender and delete
> all copies of this message. Accelerite, a Persistent Systems business does
> not accept any liability for virus infected mails.
> >>>
>

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by ilya <il...@gmail.com>.
I guess it would help to know what type of zone you use?

Is it advanced, isolated vpc or shared network? what type of isolation?
or perhaps basic zone?

Lastly, try stopping the iptables and restarting cloud agent (via stop
and start)

Please see my response in-line

On 7/28/16 6:58 AM, Jacob Seeley wrote:
> Hi ilya,
> 
> Funny you brought up debugging the router VM. After I responding yesterday, I did just that and I did find some odd things. 
> Just to be clear (I think we're on the same page), since I'm not the OP of this thread, the virtual router always gets deployed and it starts up just fine; however, CloudStack reports that it's always stuck in starting. VMs that get deployed ultimately fail. CloudStack reports the router version as UNKNOWN.
> Before I provide what I found debugging the router VM, I'll address some of your points.
> 
> ### FOLLOW-UP QUESTIONS ###
> 
> " Another reason would be an issue of hypervisor accessing the NFS mount used for secondary storage."
> I don't believe this is an issue. The hypervisor (VMware) does mount the secondary storage via NFS just fine. If this were an issue, I would think the Secondary Storage and Console VMs would not deploy.
> 
> " Use console of vCenter to see what is happening on router vm. You can login locally with root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you..."
> It looks like to me that /var/log/cloud.out is only logged to when $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script. As such, there isn't even a file for /var/log/cloud.out. Even when I set that variable, I never get anything logged to /var/log/cloud.out. However, there is a /var/log/cloud.log. Here is the contents of that: http://pastebin.com/aaTsRKZE
> 
> " you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start on logs.."
> The service is in a failed state. It's worth noting that this service is in a started state on the Console and Secondary Storage VMs.

this is concerning - see you did "sh -x", read on..

> 
> " also, confirm that management server can talk to VR on POD IP
> (management) on port 3922.."
> It appears this is not an issue; see below:

3922 from MS to VR - this is the SSH daemon on VR with private key
8250 from VR to MS - cloudstack java agent on VR talking to MS


> 
> root@r-4-VM:~# telnet 10.70.110.101 8250
> Trying 10.70.110.101...
> Connected to 10.70.110.101.
> Escape character is '^]'.
> 


> ### ROUTE VM DEBUG ###
> 
> Here is what I found with router VM gets deployed (please tell me if anything seems off):
> 2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP address coming from the defaultGuestNetwork. NIC2 is traffic type Control but has an IP address of 0.0.0.0

It is an issue for concern to see 0.0.0.0 assigned to eth1

Lets assume NIC1 (as eth0) and NIC2 (as eth1).

1) we should not be getting 0.0.0.0 for eth1 - aka control network. This
IP should be coming from the POD network range -> when you added a pod -
i assume you did it as part of Add Zone wizard...

To see the PODIP range, goto UI
Infrastructure, Zones, Your Zone, Physical Network, Physical Network 1
(assume you did not create anything special), Management, IP Ranges ->
you should see a range defined there and it should not be 0.0.0.0...

> From the CloudStack management server, I cannot SSH into the router VM on NIC1. I've found this is because of iptables rules on the router VM. If I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH into the router VM using the SSH key at port 3922.
> The service "cloud" is in a failed state. Looking at the cloud init script, I see the following:
> 
> CMDLINE=$(cat /var/cache/cloud/cmdline)
> 
> TYPE="router"
> for i in $CMDLINE
>   do
>     # search for foo=bar pattern and cut out foo
>     FIRSTPATTERN=$(echo $i | cut -d= -f1)
>     case $FIRSTPATTERN in 
>       type)
>           TYPE=$(echo $i | cut -d= -f2)
>       ;;
>     esac
> done
> 
> The file cat /var/cache/cloud/cmdline exist; here are the contents:
> 
> template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> 


You can also try updating your  /var/cache/cloud/cmdline with proper
value for eth1ip=0.0.0.0 eth1mask=0.0.0.0, you can look it up under
Infrastructure, Routers, r-4, Nics and look for control nic..

Then try starting the cloud service..

Also, did you enable baremetal support? can you deploy a zone without
baremetal support? Perhaps there is a bug on how IPs are assigned to
eth1 (control nic)...


> The previous code suggests that the value of TYPE starts as router but will get set to dhcpsrvr, as indicated by the contents of /var/cache/cloud/cmdline. Is this normal?
> Further down the script, I see:
> 
> CLOUDSTACK_HOME="/usr/local/cloud" <----------------------------------------Exists
> if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ]; <----------------------------------------Does not exist. Seems odd!
> then
>   . $CLOUDSTACK_HOME/systemvm/utils.sh
> else
>   _failure
> fi
> 
> # mkdir -p /var/log/vmops
> 
> start() {
>    local pid=$(get_pids)
>    if [ "$pid" != "" ]; then
>        echo "CloudStack cloud sevice is already running, PID = $pid"
>        return 0
>    fi
> 
>    echo -n "Starting CloudStack cloud service (type=$TYPE) "
>    if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ]; <------------------------------------------------------Does not exist. Seems odd!
>    then
>      if [ "$pid" == "" ]
>      then
>        (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
>        pid=$(get_pids)
>        echo $pid > /var/run/cloud.pid 
>      fi
>      _success
>    else
>      _failure
>    fi
>    echo
>    echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
> }
> 
> I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder exists; however, the script then looks for the file /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks is supposed to start the script run.sh but that also doesn't exist. This seems like a problem to me.
> Here you can see step through when I try to start the cloud service:
> 
> sh -x /etc/init.d/cloud start
> + ENABLED=0
> + [ -e /etc/default/cloud ]
> + . /etc/default/cloud
> + ENABLED=0
> + cat /var/cache/cloud/cmdline
> + CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> + [ ! -z ]
> + LOG_FILE=/dev/null
> + TYPE=router
> + cut -d= -f1
> + echo template=domP
> + FIRSTPATTERN=template
> + cut -d= -f1
> + echo name=r-4-VM
> + FIRSTPATTERN=name
> + cut -d= -f1
> + echo eth0ip=10.70.116.75
> + FIRSTPATTERN=eth0ip
> + cut -d= -f1
> + echo eth0mask=255.255.255.0
> + FIRSTPATTERN=eth0mask
> + cut -d= -f1
> + echo gateway=10.70.116.1
> + FIRSTPATTERN=gateway
> + cut -d= -f1
> + echo domain=vit.vertitechit.com
> + FIRSTPATTERN=domain
> + cut -d= -f1
> + echo cidrsize=24
> + FIRSTPATTERN=cidrsize
> + cut -d= -f1
> + echo dhcprange=10.70.116.1
> + FIRSTPATTERN=dhcprange
> + cut -d= -f1
> + echo eth1ip=0.0.0.0
> + FIRSTPATTERN=eth1ip
> + cut -d= -f1
> + echo eth1mask=0.0.0.0
> + FIRSTPATTERN=eth1mask
> + cut -d= -f1
> + echo mgmtcidr=10.70.110.0/24
> + FIRSTPATTERN=mgmtcidr
> + cut -d= -f1
> + echo localgw=10.70.116.1
> + FIRSTPATTERN=localgw
> + cut -d= -f1
> + echo sshonguest=true
> + FIRSTPATTERN=sshonguest
> + cut -d= -f1
> + echo type=dhcpsrvr
> + FIRSTPATTERN=type
> + cut -d= -f2
> + echo type=dhcpsrvr
> + TYPE=dhcpsrvr
> + cut -d= -f1
> + echo disable_rp_filter=true
> + FIRSTPATTERN=disable_rp_filter
> + cut -d= -f1
> + echo extra_pubnics=2
> + FIRSTPATTERN=extra_pubnics
> + cut -d= -f1
> + echo dns1=10.70.10.21
> + FIRSTPATTERN=dns1
> + cut -d= -f1
> + echo baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
> + FIRSTPATTERN=baremetalnotificationsecuritykey
> + cut -d= -f1
> + echo baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
> + FIRSTPATTERN=baremetalnotificationapikey
> + cut -d= -f1
> + echo host=10.70.110.101
> + FIRSTPATTERN=host
> + cut -d= -f1
> + echo port=8080
> + FIRSTPATTERN=port
> + cut -d= -f1
> + echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
> + FIRSTPATTERN=nic_macs
> + [ -f /etc/init.d/functions ]
> + [ -f ./lib/lsb/init-functions ]
> + RETVAL=0
> + CLOUDSTACK_HOME=/usr/local/cloud
> + [ -f /usr/local/cloud/systemvm/utils.sh ]
> + _failure
> + [ -f /etc/init.d/functions ]
> + echo Failed
> Failed
> + [ 0 != 0 ]
> + exit 0
> 
> Thoughts?
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: ilya [mailto:ilya.mailing.lists@gmail.com] 
> Sent: Wednesday, July 27, 2016 8:43 PM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> Hi Jacob
> 
> I gave this a second read - if your issue is Router VM in starting mode
> - but not started - it means cloudstack agent on routerVM cannot talk to management server on 8250 over POD network.
> 
> Another reason would be an issue of hypervisor accessing the NFS mount used for secondary storage.
> 
> Use console of vCenter to see what is happening on router vm. You can login locally with root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you...
> 
> you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start on logs..
> 
> also, confirm that management server can talk to VR on POD IP
> (management) on port 3922..
> 
> Regards
> ilya
> 
> On 7/27/16 9:34 AM, Jacob Seeley wrote:
>> ilya,
>>
>> Here are the contents of the secondary storage:
>>
>> .
>> ./template
>> ./template/tmpl
>> ./template/tmpl/1
>> ./template/tmpl/1/8
>> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
>> ./template/tmpl/1/8/template.properties
>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>> are.ovf 
>> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
>> are-disk3.vmdk
>> ./template/tmpl/1/7
>> ./template/tmpl/1/7/template.properties
>> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
>> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
>> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
>> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
>> ./systemvm
>> ./systemvm/systemvm-4.8.0.1.iso
>> ./systemvm/.lck-bf162a0100000000
>> ./snapshots
>> ./volumes
>>
>> I've noticed that both the Secondary Storage VM and Console Proxy VM mount this ISO and as stated before, they come up just fine.
>>
>> Regards,
>>
>> Jacob Seeley
>> Sr. Infrastructure Engineer
>> VertitechIT
>> 413-268-1631
>>
>> www.vertitechit.com
>>
>> -----Original Message-----
>> From: ilya [mailto:ilya.mailing.lists@gmail.com]
>> Sent: Wednesday, July 27, 2016 3:22 AM
>> To: users@cloudstack.apache.org
>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>
>> Jacob
>>
>> The upgrade usually occurs though systemvm.iso - that is generated by cloudstack on the first start.
>>
>> Please show the content of your secondary store specifically
>>
>> /mnt/[secondary-storage]/systemvm
>>
>> Regards
>> ilya
>>
>> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>>> Here is a pastebin snippet the management-server.log - 
>>> http://pastebin.com/GCLm53Gz
>>>
>>> Hopefully the relevant data is in there.
>>>
>>> I made sure to start from scratch for this example. Everything from the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>>
>>> Thank you,
>>>
>>> Jacob Seeley
>>> Sr. Infrastructure Engineer
>>> VertitechIT
>>> 413-268-1631
>>>
>>> www.vertitechit.com
>>>
>>> -----Original Message-----
>>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>>> Sent: Monday, July 25, 2016 1:37 AM
>>> To: users@cloudstack.apache.org
>>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> please upload the logs in the issue.
>>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
>>>>
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>>
>>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>> What template are you using to start your first VM? - the default 
>>>>> vmware template?
>>>>> If you look in vcenter , what does the console show you ?
>>>>>
>>>>>
>>>>> Glenn
>>>>>
>>>>>
>>>>>
>>>>> glenn.wagner@shapeblue.com
>>>>> www.shapeblue.com
>>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>>>>> 7130South Africa @shapeblue
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>>> To: users@cloudstack.apache.org
>>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>>
>>>>> hi,
>>>>>
>>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>>
>>>>> When trying to launch the first VM, the VS is created. VS starts 
>>>>> up, but in CS, it stuck with "starting" state.
>>>>>
>>>>> i can't find any usefull information in the logs.
>>>>>
>>>>> any hint?
>>>>>
>>>
>>>
>>>
>>>
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
>>>

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Jacob Seeley <js...@vertitechit.com>.
Hi ilya,

Funny you brought up debugging the router VM. After I responding yesterday, I did just that and I did find some odd things. 
Just to be clear (I think we're on the same page), since I'm not the OP of this thread, the virtual router always gets deployed and it starts up just fine; however, CloudStack reports that it's always stuck in starting. VMs that get deployed ultimately fail. CloudStack reports the router version as UNKNOWN.
Before I provide what I found debugging the router VM, I'll address some of your points.

### FOLLOW-UP QUESTIONS ###

" Another reason would be an issue of hypervisor accessing the NFS mount used for secondary storage."
I don't believe this is an issue. The hypervisor (VMware) does mount the secondary storage via NFS just fine. If this were an issue, I would think the Secondary Storage and Console VMs would not deploy.

" Use console of vCenter to see what is happening on router vm. You can login locally with root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you..."
It looks like to me that /var/log/cloud.out is only logged to when $CLOUD_DEBUG is set to a non-zero length in the /etc/init.d/cloud script. As such, there isn't even a file for /var/log/cloud.out. Even when I set that variable, I never get anything logged to /var/log/cloud.out. However, there is a /var/log/cloud.log. Here is the contents of that: http://pastebin.com/aaTsRKZE

" you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start on logs.."
The service is in a failed state. It's worth noting that this service is in a started state on the Console and Secondary Storage VMs.

" also, confirm that management server can talk to VR on POD IP
(management) on port 3922.."
It appears this is not an issue; see below:

root@r-4-VM:~# telnet 10.70.110.101 8250
Trying 10.70.110.101...
Connected to 10.70.110.101.
Escape character is '^]'.

### ROUTE VM DEBUG ###

Here is what I found with router VM gets deployed (please tell me if anything seems off):
2 NICs; only one NIC gets an IP  address. CloudStack NIC1 shows an IP address coming from the defaultGuestNetwork. NIC2 is traffic type Control but has an IP address of 0.0.0.0
From the CloudStack management server, I cannot SSH into the router VM on NIC1. I've found this is because of iptables rules on the router VM. If I issue a /etc/init.d/iptables-persistent flush on the router VM, I can SSH into the router VM using the SSH key at port 3922.
The service "cloud" is in a failed state. Looking at the cloud init script, I see the following:

CMDLINE=$(cat /var/cache/cloud/cmdline)

TYPE="router"
for i in $CMDLINE
  do
    # search for foo=bar pattern and cut out foo
    FIRSTPATTERN=$(echo $i | cut -d= -f1)
    case $FIRSTPATTERN in 
      type)
          TYPE=$(echo $i | cut -d= -f2)
      ;;
    esac
done

The file cat /var/cache/cloud/cmdline exist; here are the contents:

template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03

The previous code suggests that the value of TYPE starts as router but will get set to dhcpsrvr, as indicated by the contents of /var/cache/cloud/cmdline. Is this normal?
Further down the script, I see:

CLOUDSTACK_HOME="/usr/local/cloud" <----------------------------------------Exists
if [ -f  $CLOUDSTACK_HOME/systemvm/utils.sh ]; <----------------------------------------Does not exist. Seems odd!
then
  . $CLOUDSTACK_HOME/systemvm/utils.sh
else
  _failure
fi

# mkdir -p /var/log/vmops

start() {
   local pid=$(get_pids)
   if [ "$pid" != "" ]; then
       echo "CloudStack cloud sevice is already running, PID = $pid"
       return 0
   fi

   echo -n "Starting CloudStack cloud service (type=$TYPE) "
   if [ -f $CLOUDSTACK_HOME/systemvm/run.sh ]; <------------------------------------------------------Does not exist. Seems odd!
   then
     if [ "$pid" == "" ]
     then
       (cd $CLOUDSTACK_HOME/systemvm; nohup ./run.sh > $LOG_FILE 2>&1 & )
       pid=$(get_pids)
       echo $pid > /var/run/cloud.pid 
     fi
     _success
   else
     _failure
   fi
   echo
   echo 'start' > $CLOUDSTACK_HOME/systemvm/user_request
}

I see that it sets CLOUDSTACK_HOME to /usr/local/cloud. This folder exists; however, the script then looks for the file /usr/local/cloud/systemvm/utils.sh. This file doesn't exist. It also looks is supposed to start the script run.sh but that also doesn't exist. This seems like a problem to me.
Here you can see step through when I try to start the cloud service:

sh -x /etc/init.d/cloud start
+ ENABLED=0
+ [ -e /etc/default/cloud ]
+ . /etc/default/cloud
+ ENABLED=0
+ cat /var/cache/cloud/cmdline
+ CMDLINE= template=domP name=r-4-VM eth0ip=10.70.116.75 eth0mask=255.255.255.0 gateway=10.70.116.1 domain=vit.vertitechit.com cidrsize=24 dhcprange=10.70.116.1 eth1ip=0.0.0.0 eth1mask=0.0.0.0 mgmtcidr=10.70.110.0/24 localgw=10.70.116.1 sshonguest=true type=dhcpsrvr disable_rp_filter=true extra_pubnics=2 dns1=10.70.10.21 baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ host=10.70.110.101 port=8080 nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
+ [ ! -z ]
+ LOG_FILE=/dev/null
+ TYPE=router
+ cut -d= -f1
+ echo template=domP
+ FIRSTPATTERN=template
+ cut -d= -f1
+ echo name=r-4-VM
+ FIRSTPATTERN=name
+ cut -d= -f1
+ echo eth0ip=10.70.116.75
+ FIRSTPATTERN=eth0ip
+ cut -d= -f1
+ echo eth0mask=255.255.255.0
+ FIRSTPATTERN=eth0mask
+ cut -d= -f1
+ echo gateway=10.70.116.1
+ FIRSTPATTERN=gateway
+ cut -d= -f1
+ echo domain=vit.vertitechit.com
+ FIRSTPATTERN=domain
+ cut -d= -f1
+ echo cidrsize=24
+ FIRSTPATTERN=cidrsize
+ cut -d= -f1
+ echo dhcprange=10.70.116.1
+ FIRSTPATTERN=dhcprange
+ cut -d= -f1
+ echo eth1ip=0.0.0.0
+ FIRSTPATTERN=eth1ip
+ cut -d= -f1
+ echo eth1mask=0.0.0.0
+ FIRSTPATTERN=eth1mask
+ cut -d= -f1
+ echo mgmtcidr=10.70.110.0/24
+ FIRSTPATTERN=mgmtcidr
+ cut -d= -f1
+ echo localgw=10.70.116.1
+ FIRSTPATTERN=localgw
+ cut -d= -f1
+ echo sshonguest=true
+ FIRSTPATTERN=sshonguest
+ cut -d= -f1
+ echo type=dhcpsrvr
+ FIRSTPATTERN=type
+ cut -d= -f2
+ echo type=dhcpsrvr
+ TYPE=dhcpsrvr
+ cut -d= -f1
+ echo disable_rp_filter=true
+ FIRSTPATTERN=disable_rp_filter
+ cut -d= -f1
+ echo extra_pubnics=2
+ FIRSTPATTERN=extra_pubnics
+ cut -d= -f1
+ echo dns1=10.70.10.21
+ FIRSTPATTERN=dns1
+ cut -d= -f1
+ echo baremetalnotificationsecuritykey=nu1HfF_DpC-gK-G_3y1u54Snb9ruROq-qldOvhnHj4EMypguvtfQu0o18eY3gs81iPZMD2Du1QOUAG5KOfMYXQ
+ FIRSTPATTERN=baremetalnotificationsecuritykey
+ cut -d= -f1
+ echo baremetalnotificationapikey=CKZoOXffpY5ihjvzly3yD_2t2qaDnFglYFDoeep37aH1qy5u67aX51ZsuZpZcphfOxJY52rkTlNOl0nkNSyXjQ
+ FIRSTPATTERN=baremetalnotificationapikey
+ cut -d= -f1
+ echo host=10.70.110.101
+ FIRSTPATTERN=host
+ cut -d= -f1
+ echo port=8080
+ FIRSTPATTERN=port
+ cut -d= -f1
+ echo nic_macs=06:b1:2e:00:00:10|02:00:14:42:00:03
+ FIRSTPATTERN=nic_macs
+ [ -f /etc/init.d/functions ]
+ [ -f ./lib/lsb/init-functions ]
+ RETVAL=0
+ CLOUDSTACK_HOME=/usr/local/cloud
+ [ -f /usr/local/cloud/systemvm/utils.sh ]
+ _failure
+ [ -f /etc/init.d/functions ]
+ echo Failed
Failed
+ [ 0 != 0 ]
+ exit 0

Thoughts?

Jacob Seeley
Sr. Infrastructure Engineer
VertitechIT
413-268-1631

www.vertitechit.com

-----Original Message-----
From: ilya [mailto:ilya.mailing.lists@gmail.com] 
Sent: Wednesday, July 27, 2016 8:43 PM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

Hi Jacob

I gave this a second read - if your issue is Router VM in starting mode
- but not started - it means cloudstack agent on routerVM cannot talk to management server on 8250 over POD network.

Another reason would be an issue of hypervisor accessing the NFS mount used for secondary storage.

Use console of vCenter to see what is happening on router vm. You can login locally with root/password and see the content of /var/log/cloud.out file, paste it on pastebin - if it makes no sense to you...

you can also run /etc/init.d/cloud stop and start.. that will give you a fresh start on logs..

also, confirm that management server can talk to VR on POD IP
(management) on port 3922..

Regards
ilya

On 7/27/16 9:34 AM, Jacob Seeley wrote:
> ilya,
> 
> Here are the contents of the secondary storage:
> 
> .
> ./template
> ./template/tmpl
> ./template/tmpl/1
> ./template/tmpl/1/8
> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
> ./template/tmpl/1/8/template.properties
> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
> are.ovf 
> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmw
> are-disk3.vmdk
> ./template/tmpl/1/7
> ./template/tmpl/1/7/template.properties
> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
> ./systemvm
> ./systemvm/systemvm-4.8.0.1.iso
> ./systemvm/.lck-bf162a0100000000
> ./snapshots
> ./volumes
> 
> I've noticed that both the Secondary Storage VM and Console Proxy VM mount this ISO and as stated before, they come up just fine.
> 
> Regards,
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: ilya [mailto:ilya.mailing.lists@gmail.com]
> Sent: Wednesday, July 27, 2016 3:22 AM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> Jacob
> 
> The upgrade usually occurs though systemvm.iso - that is generated by cloudstack on the first start.
> 
> Please show the content of your secondary store specifically
> 
> /mnt/[secondary-storage]/systemvm
> 
> Regards
> ilya
> 
> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>> Here is a pastebin snippet the management-server.log - 
>> http://pastebin.com/GCLm53Gz
>>
>> Hopefully the relevant data is in there.
>>
>> I made sure to start from scratch for this example. Everything from the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>
>> Thank you,
>>
>> Jacob Seeley
>> Sr. Infrastructure Engineer
>> VertitechIT
>> 413-268-1631
>>
>> www.vertitechit.com
>>
>> -----Original Message-----
>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>> Sent: Monday, July 25, 2016 1:37 AM
>> To: users@cloudstack.apache.org
>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>
>> please upload the logs in the issue.
>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
>>>
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>
>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>
>>>> Hi,
>>>>
>>>> What template are you using to start your first VM? - the default 
>>>> vmware template?
>>>> If you look in vcenter , what does the console show you ?
>>>>
>>>>
>>>> Glenn
>>>>
>>>>
>>>>
>>>> glenn.wagner@shapeblue.com
>>>> www.shapeblue.com
>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>>>> 7130South Africa @shapeblue
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>> To: users@cloudstack.apache.org
>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> hi,
>>>>
>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>
>>>> When trying to launch the first VM, the VS is created. VS starts 
>>>> up, but in CS, it stuck with "starting" state.
>>>>
>>>> i can't find any usefull information in the logs.
>>>>
>>>> any hint?
>>>>
>>
>>
>>
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
>>

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by ilya <il...@gmail.com>.
Hi Jacob

I gave this a second read - if your issue is Router VM in starting mode
- but not started - it means cloudstack agent on routerVM cannot talk to
management server on 8250 over POD network.

Another reason would be an issue of hypervisor accessing the NFS mount
used for secondary storage.

Use console of vCenter to see what is happening on router vm. You can
login locally with root/password and see the content of
/var/log/cloud.out file, paste it on pastebin - if it makes no sense to
you...

you can also run /etc/init.d/cloud stop and start.. that will give you a
fresh start on logs..

also, confirm that management server can talk to VR on POD IP
(management) on port 3922..

Regards
ilya

On 7/27/16 9:34 AM, Jacob Seeley wrote:
> ilya,
> 
> Here are the contents of the secondary storage:
> 
> .
> ./template
> ./template/tmpl
> ./template/tmpl/1
> ./template/tmpl/1/8
> ./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
> ./template/tmpl/1/8/template.properties
> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmware.ovf
> ./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmware-disk3.vmdk
> ./template/tmpl/1/7
> ./template/tmpl/1/7/template.properties
> ./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
> ./template/tmpl/1/7/CentOS5.3-x86_64.ovf
> ./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
> ./template/tmpl/1/7/CentOS5.3-x86_64.mf
> ./systemvm
> ./systemvm/systemvm-4.8.0.1.iso
> ./systemvm/.lck-bf162a0100000000
> ./snapshots
> ./volumes
> 
> I've noticed that both the Secondary Storage VM and Console Proxy VM mount this ISO and as stated before, they come up just fine.
> 
> Regards,
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: ilya [mailto:ilya.mailing.lists@gmail.com] 
> Sent: Wednesday, July 27, 2016 3:22 AM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> Jacob
> 
> The upgrade usually occurs though systemvm.iso - that is generated by cloudstack on the first start.
> 
> Please show the content of your secondary store specifically
> 
> /mnt/[secondary-storage]/systemvm
> 
> Regards
> ilya
> 
> On 7/25/16 11:19 AM, Jacob Seeley wrote:
>> Here is a pastebin snippet the management-server.log - 
>> http://pastebin.com/GCLm53Gz
>>
>> Hopefully the relevant data is in there.
>>
>> I made sure to start from scratch for this example. Everything from the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
>>
>> Thank you,
>>
>> Jacob Seeley
>> Sr. Infrastructure Engineer
>> VertitechIT
>> 413-268-1631
>>
>> www.vertitechit.com
>>
>> -----Original Message-----
>> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
>> Sent: Monday, July 25, 2016 1:37 AM
>> To: users@cloudstack.apache.org
>> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
>>
>> please upload the logs in the issue.
>>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
>>>
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>>
>>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>>
>>>> Hi,
>>>>
>>>> What template are you using to start your first VM? - the default 
>>>> vmware template?
>>>> If you look in vcenter , what does the console show you ?
>>>>
>>>>
>>>> Glenn
>>>>
>>>>
>>>>
>>>> glenn.wagner@shapeblue.com
>>>> www.shapeblue.com
>>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>>>> 7130South Africa @shapeblue
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Pascal R. [mailto:repa182@gmail.com]
>>>> Sent: Monday, 04 July 2016 1:26 PM
>>>> To: users@cloudstack.apache.org
>>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>>
>>>> hi,
>>>>
>>>> we have a CS4.8 deployment with VMWare 5.5.
>>>>
>>>> When trying to launch the first VM, the VS is created. VS starts up, 
>>>> but in CS, it stuck with "starting" state.
>>>>
>>>> i can't find any usefull information in the logs.
>>>>
>>>> any hint?
>>>>
>>
>>
>>
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
>>

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Jacob Seeley <js...@vertitechit.com>.
ilya,

Here are the contents of the secondary storage:

.
./template
./template/tmpl
./template/tmpl/1
./template/tmpl/1/8
./template/tmpl/1/8/49a4c4ee-ef06-4474-92c3-1d8efb082266.ova
./template/tmpl/1/8/template.properties
./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmware.ovf
./template/tmpl/1/8/systemvm64template-4.6.0-RC20151104T1522-4.6.0-vmware-disk3.vmdk
./template/tmpl/1/7
./template/tmpl/1/7/template.properties
./template/tmpl/1/7/0098d168-4985-3b33-9840-eb5848d2f385.ova
./template/tmpl/1/7/CentOS5.3-x86_64.ovf
./template/tmpl/1/7/CentOS5.3-x86_64-disk1.vmdk
./template/tmpl/1/7/CentOS5.3-x86_64.mf
./systemvm
./systemvm/systemvm-4.8.0.1.iso
./systemvm/.lck-bf162a0100000000
./snapshots
./volumes

I've noticed that both the Secondary Storage VM and Console Proxy VM mount this ISO and as stated before, they come up just fine.

Regards,

Jacob Seeley
Sr. Infrastructure Engineer
VertitechIT
413-268-1631

www.vertitechit.com

-----Original Message-----
From: ilya [mailto:ilya.mailing.lists@gmail.com] 
Sent: Wednesday, July 27, 2016 3:22 AM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

Jacob

The upgrade usually occurs though systemvm.iso - that is generated by cloudstack on the first start.

Please show the content of your secondary store specifically

/mnt/[secondary-storage]/systemvm

Regards
ilya

On 7/25/16 11:19 AM, Jacob Seeley wrote:
> Here is a pastebin snippet the management-server.log - 
> http://pastebin.com/GCLm53Gz
> 
> Hopefully the relevant data is in there.
> 
> I made sure to start from scratch for this example. Everything from the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
> 
> Thank you,
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com]
> Sent: Monday, July 25, 2016 1:37 AM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> please upload the logs in the issue.
>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
>>
>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>
>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>
>>> Hi,
>>>
>>> What template are you using to start your first VM? - the default 
>>> vmware template?
>>> If you look in vcenter , what does the console show you ?
>>>
>>>
>>> Glenn
>>>
>>>
>>>
>>> glenn.wagner@shapeblue.com
>>> www.shapeblue.com
>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>>> 7130South Africa @shapeblue
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Pascal R. [mailto:repa182@gmail.com]
>>> Sent: Monday, 04 July 2016 1:26 PM
>>> To: users@cloudstack.apache.org
>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> hi,
>>>
>>> we have a CS4.8 deployment with VMWare 5.5.
>>>
>>> When trying to launch the first VM, the VS is created. VS starts up, 
>>> but in CS, it stuck with "starting" state.
>>>
>>> i can't find any usefull information in the logs.
>>>
>>> any hint?
>>>
> 
> 
> 
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
> 

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by ilya <il...@gmail.com>.
Jacob

The upgrade usually occurs though systemvm.iso - that is generated by
cloudstack on the first start.

Please show the content of your secondary store specifically

/mnt/[secondary-storage]/systemvm

Regards
ilya

On 7/25/16 11:19 AM, Jacob Seeley wrote:
> Here is a pastebin snippet the management-server.log - http://pastebin.com/GCLm53Gz
> 
> Hopefully the relevant data is in there.
> 
> I made sure to start from scratch for this example. Everything from the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.
> 
> Thank you,
> 
> Jacob Seeley
> Sr. Infrastructure Engineer
> VertitechIT
> 413-268-1631
> 
> www.vertitechit.com
> 
> -----Original Message-----
> From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com] 
> Sent: Monday, July 25, 2016 1:37 AM
> To: users@cloudstack.apache.org
> Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting
> 
> please upload the logs in the issue.
>> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
>>
>> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
>>
>> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
>>
>>> Hi,
>>>
>>> What template are you using to start your first VM? - the default 
>>> vmware template?
>>> If you look in vcenter , what does the console show you ?
>>>
>>>
>>> Glenn
>>>
>>>
>>>
>>> glenn.wagner@shapeblue.com
>>> www.shapeblue.com
>>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>>> 7130South Africa @shapeblue
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Pascal R. [mailto:repa182@gmail.com]
>>> Sent: Monday, 04 July 2016 1:26 PM
>>> To: users@cloudstack.apache.org
>>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>>>
>>> hi,
>>>
>>> we have a CS4.8 deployment with VMWare 5.5.
>>>
>>> When trying to launch the first VM, the VS is created. VS starts up, 
>>> but in CS, it stuck with "starting" state.
>>>
>>> i can't find any usefull information in the logs.
>>>
>>> any hint?
>>>
> 
> 
> 
> 
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
> 

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Jacob Seeley <js...@vertitechit.com>.
Here is a pastebin snippet the management-server.log - http://pastebin.com/GCLm53Gz

Hopefully the relevant data is in there.

I made sure to start from scratch for this example. Everything from the vSphere ESXi to the vCenter to the CentOS 7 with CloudStack install is fresh. I deployed a new instance in CloudStack, a VM internally named i-2-3-VM with an IP address of 192.168.0.78. This prompted CloudStack to deploy a VR. The VR is called r-4-VM with an IP address of 192.168.0.79.

Thank you,

Jacob Seeley
Sr. Infrastructure Engineer
VertitechIT
413-268-1631

www.vertitechit.com

-----Original Message-----
From: Suresh Sadhu [mailto:suresh.sadhu@accelerite.com] 
Sent: Monday, July 25, 2016 1:37 AM
To: users@cloudstack.apache.org
Subject: Re: CS 4.8 VMware - Virtual Router stuck at starting

please upload the logs in the issue.
> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
> 
> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
> 
> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
> 
>> Hi,
>> 
>> What template are you using to start your first VM? - the default 
>> vmware template?
>> If you look in vcenter , what does the console show you ?
>> 
>> 
>> Glenn
>> 
>> 
>> 
>> glenn.wagner@shapeblue.com
>> www.shapeblue.com
>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town 
>> 7130South Africa @shapeblue
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Pascal R. [mailto:repa182@gmail.com]
>> Sent: Monday, 04 July 2016 1:26 PM
>> To: users@cloudstack.apache.org
>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>> 
>> hi,
>> 
>> we have a CS4.8 deployment with VMWare 5.5.
>> 
>> When trying to launch the first VM, the VS is created. VS starts up, 
>> but in CS, it stuck with "starting" state.
>> 
>> i can't find any usefull information in the logs.
>> 
>> any hint?
>> 




DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Suresh Sadhu <su...@accelerite.com>.
please upload the logs in the issue.
> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
> 
> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
> 
> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
> 
>> Hi,
>> 
>> What template are you using to start your first VM? - the default vmware
>> template?
>> If you look in vcenter , what does the console show you ?
>> 
>> 
>> Glenn
>> 
>> 
>> 
>> glenn.wagner@shapeblue.com
>> www.shapeblue.com
>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town
>> 7130South Africa
>> @shapeblue
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Pascal R. [mailto:repa182@gmail.com]
>> Sent: Monday, 04 July 2016 1:26 PM
>> To: users@cloudstack.apache.org
>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>> 
>> hi,
>> 
>> we have a CS4.8 deployment with VMWare 5.5.
>> 
>> When trying to launch the first VM, the VS is created. VS starts up, but
>> in CS, it stuck with "starting" state.
>> 
>> i can't find any usefull information in the logs.
>> 
>> any hint?
>> 




DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Suresh Sadhu <su...@accelerite.com>.
> On Jul 5, 2016, at 8:46 AM, Darren Tang <da...@gmail.com> wrote:
> 
> https://issues.apache.org/jira/browse/CLOUDSTACK-9144
> 
> 2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:
> 
>> Hi,
>> 
>> What template are you using to start your first VM? - the default vmware
>> template?
>> If you look in vcenter , what does the console show you ?
>> 
>> 
>> Glenn
>> 
>> 
>> 
>> glenn.wagner@shapeblue.com
>> www.shapeblue.com
>> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town
>> 7130South Africa
>> @shapeblue
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Pascal R. [mailto:repa182@gmail.com]
>> Sent: Monday, 04 July 2016 1:26 PM
>> To: users@cloudstack.apache.org
>> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>> 
>> hi,
>> 
>> we have a CS4.8 deployment with VMWare 5.5.
>> 
>> When trying to launch the first VM, the VS is created. VS starts up, but
>> in CS, it stuck with "starting" state.
>> 
>> i can't find any usefull information in the logs.
>> 
>> any hint?
>> 




DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.

Re: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Darren Tang <da...@gmail.com>.
https://issues.apache.org/jira/browse/CLOUDSTACK-9144

2016-07-04 19:41 GMT+08:00 Glenn Wagner <gl...@shapeblue.com>:

> Hi,
>
> What template are you using to start your first VM? - the default vmware
> template?
> If you look in vcenter , what does the console show you ?
>
>
> Glenn
>
>
>
> glenn.wagner@shapeblue.com
> www.shapeblue.com
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town
> 7130South Africa
> @shapeblue
>
>
>
>
> -----Original Message-----
> From: Pascal R. [mailto:repa182@gmail.com]
> Sent: Monday, 04 July 2016 1:26 PM
> To: users@cloudstack.apache.org
> Subject: CS 4.8 VMware - Virtual Router stuck at starting
>
> hi,
>
> we have a CS4.8 deployment with VMWare 5.5.
>
> When trying to launch the first VM, the VS is created. VS starts up, but
> in CS, it stuck with "starting" state.
>
> i can't find any usefull information in the logs.
>
> any hint?
>

RE: CS 4.8 VMware - Virtual Router stuck at starting

Posted by Glenn Wagner <gl...@shapeblue.com>.
Hi,

What template are you using to start your first VM? - the default vmware template? 
If you look in vcenter , what does the console show you ?


Glenn



glenn.wagner@shapeblue.comĀ 
www.shapeblue.com
2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West, Cape Town  7130South Africa
@shapeblue
  
 


-----Original Message-----
From: Pascal R. [mailto:repa182@gmail.com] 
Sent: Monday, 04 July 2016 1:26 PM
To: users@cloudstack.apache.org
Subject: CS 4.8 VMware - Virtual Router stuck at starting

hi,

we have a CS4.8 deployment with VMWare 5.5.

When trying to launch the first VM, the VS is created. VS starts up, but in CS, it stuck with "starting" state.

i can't find any usefull information in the logs.

any hint?