You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vcl.apache.org by Curtis <se...@gmail.com> on 2014/04/02 00:16:17 UTC

sshd doesn't start after new image creation process reboots, imaging fails

Hi All,

We are having an issue with some of our images where when we try to
create a new image from an existing image, everything goes ok until
the part where the virtual machine is rebooted, and after it's
rebooted sshd does not start up and the imaging process fails.

Anyone have any thoughts? I'm fairly sure it has something to do with
the various commands that are run on the image once an image creation
process starts.

Thanks,
Curtis.

-- 
Twitter: @serverascode
Blog: serverascode.com

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Curtis <se...@gmail.com>.
On Wed, Apr 2, 2014 at 3:35 PM, Andy Kurth <an...@ncsu.edu> wrote:
> Your output looks almost the same as when I successfully ssh in to a
> working VM here.  The only difference I can see up to when yours times
> out is that the last line refers to "vcl.key-cert":
>
> Yours:
> debug3: key_read: missing keytype
> debug1: identity file /etc/vcl/vcl.key type 1
> debug1: identity file /etc/vcl/vcl.key-cert type -1
>
> Ours:
> debug3: key_read: missing keytype
> debug1: identity file /etc/vcl/vcl.key type -1
>
> While logged in as root, you can try stopping the sshd service and
> then from a Cygwin shell, run:
> /usr/sbin/sshd.exe -ddd
>
> Then try to connect from the management node.  The debugging output
> from sshd.exe should be displayed in the Cygwin window.  What does it
> look like?  I'll compare it with one of ours.  You can also try the
> same on a working computer and compare the output.
>

Every time I get one of these hung sshd's it's the same thing -- I
can't restart sshd with cygrunsrv or M$ services, but I can taskkill
it and then start it and it works fine.

I did grab the output of a sshd -ddd session, but it will just show a
good working connection because once sshd is killed and restarted it
works fine.

Thanks,
Curtis.

>
> On Wed, Apr 2, 2014 at 3:41 PM, Curtis <se...@gmail.com> wrote:
>>
>> On Wed, Apr 2, 2014 at 11:06 AM, Andy Kurth <an...@ncsu.edu> wrote:
>> > It looks like ssh on the management node is using a ConnectTimeout value of
>> > 2 seconds:
>> > debug3: timeout: 1999 ms remain after connect
>> >
>> > Does specifying a longer time make a difference?
>> > ssh -o ConnectTimeout=10 -vvvv vm79
>> >
>>
>> No, doesn't seem to change anything. Though I had set the connect
>> timeout to 2 only recently because I was testing rebooting virtual
>> machines and seeing if I could connect via ssh to them after a reboot,
>> so it was set to whatever the default was before when it first started
>> breakin.
>>
>> Below is a session with it set to 10.
>>
>> root@VCL-PROD:~] $ ssh -vvvv vm79
>> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
>> debug1: Reading configuration data /root/.ssh/config
>> debug1: Applying options for vm*
>> debug1: Reading configuration data /etc/ssh/ssh_config
>> debug1: Applying options for *
>> debug2: ssh_connect: needpriv 0
>> debug1: Connecting to vm79 [10.1.0.195] port 22.
>> debug2: fd 3 setting O_NONBLOCK
>> debug1: fd 3 clearing O_NONBLOCK
>> debug1: Connection established.
>> debug3: timeout: 10000 ms remain after connect
>> debug1: permanently_set_uid: 0/0
>> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
>> debug2: key_type_from_name: unknown key type '-----BEGIN'
>> debug3: key_read: missing keytype
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug2: key_type_from_name: unknown key type '-----END'
>> debug3: key_read: missing keytype
>> debug1: identity file /etc/vcl/vcl.key type 1
>> debug1: identity file /etc/vcl/vcl.key-cert type -1
>> Connection timed out during banner exchange
>>
>> >
>> >
>> > On Wed, Apr 2, 2014 at 11:19 AM, Curtis <se...@gmail.com> wrote:
>> >
>> >> Hi Andy,
>> >>
>> >> Thanks, inline...
>> >>
>> >> On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <an...@ncsu.edu> wrote:
>> >> > I can't tell from just the commands.  They look normal.  Were there any
>> >> > WARNING messages during the image process prior to the reboot?
>> >> >
>> >> > What error message is reported when you try to ssh from the management
>> >> > node? (Connection timed out, etc)  It may be helpful if you send the
>> >> output
>> >> > from running "ssh -v <win_computer>".
>> >> >
>> >>
>> >> This is what that output looks like:
>> >>
>> >> [root@VCL-PROD:~] $ ssh -vvvv vm79
>> >> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
>> >> debug1: Reading configuration data /root/.ssh/config
>> >> debug1: Applying options for vm*
>> >> debug1: Reading configuration data /etc/ssh/ssh_config
>> >> debug1: Applying options for *
>> >> debug2: ssh_connect: needpriv 0
>> >> debug1: Connecting to vm79 [10.1.0.195] port 22.
>> >> debug2: fd 3 setting O_NONBLOCK
>> >> debug1: fd 3 clearing O_NONBLOCK
>> >> debug1: Connection established.
>> >> debug3: timeout: 1999 ms remain after connect
>> >> debug1: permanently_set_uid: 0/0
>> >> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
>> >> debug2: key_type_from_name: unknown key type '-----BEGIN'
>> >> debug3: key_read: missing keytype
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug2: key_type_from_name: unknown key type '-----END'
>> >> debug3: key_read: missing keytype
>> >> debug1: identity file /etc/vcl/vcl.key type 1
>> >> debug1: identity file /etc/vcl/vcl.key-cert type -1
>> >> Connection timed out during banner exchange
>> >>
>> >> > To troubleshoot, you'll need to login as root using the password which
>> >> was
>> >> > redacted from the vcld.log output.  Check the following:
>> >> >
>> >> > Is the Cygwin SSHD service running?  If not, try to start it.  If you get
>> >> > an error related to incorrect credentials then something went wrong when
>> >> > root's password was set early on in the image capture process.
>> >>
>> >> It's usually hung up, ie. won't respond to commands.
>> >>
>> >> If I login to the vm on its console (with virt-manager) then sshd
>> >> can't be restarted from the windows service console, or cygrunsrv, but
>> >> if I kill it with taskill and then start it, it starts up fine.
>> >>
>> >> Something to do with long logon times maybe?
>> >>
>> >> >
>> >> > If SSHD is running, it could be a firewall problem.  Try simply turning
>> >> off
>> >> > the firewall temporarily on the Windows computer and try to ssh from the
>> >> > management node.
>> >>
>> >> The windows fw is not on, or at least it says it's not on. It's turned
>> >> off in the image.
>> >>
>> >> >
>> >> > If the firewall isn't the problem, something isn't configured correctly
>> >> > with the sshd service.  While logged in as root, you can try running
>> >> > C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run
>> >> automatically
>> >> > when an image is loaded and configures sshd correctly and starts the
>> >> > service.  If running this solves the problem, then you'll have to figure
>> >> > out which commands or changes made by this script fixed it.  If possibly,
>> >> > it will be easier to troubleshoot if you take a snapshot of the computer
>> >> > before running this script so that you can revert to the broken state in
>> >> > order to narrow down the problem.
>> >> >
>> >>
>> >> Ok will give the update_cygwin.cmd a shot.
>> >>
>> >> Thanks,
>> >> Curtis.
>> >>
>> >> > -Andy
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Apr 1, 2014 at 6:28 PM, Curtis <se...@gmail.com> wrote:
>> >> >
>> >> >> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > We are having an issue with some of our images where when we try to
>> >> >> > create a new image from an existing image, everything goes ok until
>> >> >> > the part where the virtual machine is rebooted, and after it's
>> >> >> > rebooted sshd does not start up and the imaging process fails.
>> >> >> >
>> >> >> > Anyone have any thoughts? I'm fairly sure it has something to do with
>> >> >> > the various commands that are run on the image once an image creation
>> >> >> > process starts.
>> >> >>
>> >> >> Also, this gist has all the commands that are being run:
>> >> >>
>> >> >> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
>> >> >>
>> >> >> But I'm not much of a windows administrator -- does anyone see
>> >> >> anything unusual in that gist that might be causing issues? Perhaps
>> >> >> something with the root logon or password?
>> >> >>
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Curtis.
>> >> >> >
>> >> >> > --
>> >> >> > Twitter: @serverascode
>> >> >> > Blog: serverascode.com
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Twitter: @serverascode
>> >> >> Blog: serverascode.com
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Twitter: @serverascode
>> >> Blog: serverascode.com
>> >>
>>
>>
>>
>> --
>> Twitter: @serverascode
>> Blog: serverascode.com



-- 
Twitter: @serverascode
Blog: serverascode.com

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Andy Kurth <an...@ncsu.edu>.
Your output looks almost the same as when I successfully ssh in to a
working VM here.  The only difference I can see up to when yours times
out is that the last line refers to "vcl.key-cert":

Yours:
debug3: key_read: missing keytype
debug1: identity file /etc/vcl/vcl.key type 1
debug1: identity file /etc/vcl/vcl.key-cert type -1

Ours:
debug3: key_read: missing keytype
debug1: identity file /etc/vcl/vcl.key type -1

While logged in as root, you can try stopping the sshd service and
then from a Cygwin shell, run:
/usr/sbin/sshd.exe -ddd

Then try to connect from the management node.  The debugging output
from sshd.exe should be displayed in the Cygwin window.  What does it
look like?  I'll compare it with one of ours.  You can also try the
same on a working computer and compare the output.


On Wed, Apr 2, 2014 at 3:41 PM, Curtis <se...@gmail.com> wrote:
>
> On Wed, Apr 2, 2014 at 11:06 AM, Andy Kurth <an...@ncsu.edu> wrote:
> > It looks like ssh on the management node is using a ConnectTimeout value of
> > 2 seconds:
> > debug3: timeout: 1999 ms remain after connect
> >
> > Does specifying a longer time make a difference?
> > ssh -o ConnectTimeout=10 -vvvv vm79
> >
>
> No, doesn't seem to change anything. Though I had set the connect
> timeout to 2 only recently because I was testing rebooting virtual
> machines and seeing if I could connect via ssh to them after a reboot,
> so it was set to whatever the default was before when it first started
> breakin.
>
> Below is a session with it set to 10.
>
> root@VCL-PROD:~] $ ssh -vvvv vm79
> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
> debug1: Reading configuration data /root/.ssh/config
> debug1: Applying options for vm*
> debug1: Reading configuration data /etc/ssh/ssh_config
> debug1: Applying options for *
> debug2: ssh_connect: needpriv 0
> debug1: Connecting to vm79 [10.1.0.195] port 22.
> debug2: fd 3 setting O_NONBLOCK
> debug1: fd 3 clearing O_NONBLOCK
> debug1: Connection established.
> debug3: timeout: 10000 ms remain after connect
> debug1: permanently_set_uid: 0/0
> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
> debug2: key_type_from_name: unknown key type '-----BEGIN'
> debug3: key_read: missing keytype
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug2: key_type_from_name: unknown key type '-----END'
> debug3: key_read: missing keytype
> debug1: identity file /etc/vcl/vcl.key type 1
> debug1: identity file /etc/vcl/vcl.key-cert type -1
> Connection timed out during banner exchange
>
> >
> >
> > On Wed, Apr 2, 2014 at 11:19 AM, Curtis <se...@gmail.com> wrote:
> >
> >> Hi Andy,
> >>
> >> Thanks, inline...
> >>
> >> On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <an...@ncsu.edu> wrote:
> >> > I can't tell from just the commands.  They look normal.  Were there any
> >> > WARNING messages during the image process prior to the reboot?
> >> >
> >> > What error message is reported when you try to ssh from the management
> >> > node? (Connection timed out, etc)  It may be helpful if you send the
> >> output
> >> > from running "ssh -v <win_computer>".
> >> >
> >>
> >> This is what that output looks like:
> >>
> >> [root@VCL-PROD:~] $ ssh -vvvv vm79
> >> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
> >> debug1: Reading configuration data /root/.ssh/config
> >> debug1: Applying options for vm*
> >> debug1: Reading configuration data /etc/ssh/ssh_config
> >> debug1: Applying options for *
> >> debug2: ssh_connect: needpriv 0
> >> debug1: Connecting to vm79 [10.1.0.195] port 22.
> >> debug2: fd 3 setting O_NONBLOCK
> >> debug1: fd 3 clearing O_NONBLOCK
> >> debug1: Connection established.
> >> debug3: timeout: 1999 ms remain after connect
> >> debug1: permanently_set_uid: 0/0
> >> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
> >> debug2: key_type_from_name: unknown key type '-----BEGIN'
> >> debug3: key_read: missing keytype
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug3: key_read: missing whitespace
> >> debug2: key_type_from_name: unknown key type '-----END'
> >> debug3: key_read: missing keytype
> >> debug1: identity file /etc/vcl/vcl.key type 1
> >> debug1: identity file /etc/vcl/vcl.key-cert type -1
> >> Connection timed out during banner exchange
> >>
> >> > To troubleshoot, you'll need to login as root using the password which
> >> was
> >> > redacted from the vcld.log output.  Check the following:
> >> >
> >> > Is the Cygwin SSHD service running?  If not, try to start it.  If you get
> >> > an error related to incorrect credentials then something went wrong when
> >> > root's password was set early on in the image capture process.
> >>
> >> It's usually hung up, ie. won't respond to commands.
> >>
> >> If I login to the vm on its console (with virt-manager) then sshd
> >> can't be restarted from the windows service console, or cygrunsrv, but
> >> if I kill it with taskill and then start it, it starts up fine.
> >>
> >> Something to do with long logon times maybe?
> >>
> >> >
> >> > If SSHD is running, it could be a firewall problem.  Try simply turning
> >> off
> >> > the firewall temporarily on the Windows computer and try to ssh from the
> >> > management node.
> >>
> >> The windows fw is not on, or at least it says it's not on. It's turned
> >> off in the image.
> >>
> >> >
> >> > If the firewall isn't the problem, something isn't configured correctly
> >> > with the sshd service.  While logged in as root, you can try running
> >> > C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run
> >> automatically
> >> > when an image is loaded and configures sshd correctly and starts the
> >> > service.  If running this solves the problem, then you'll have to figure
> >> > out which commands or changes made by this script fixed it.  If possibly,
> >> > it will be easier to troubleshoot if you take a snapshot of the computer
> >> > before running this script so that you can revert to the broken state in
> >> > order to narrow down the problem.
> >> >
> >>
> >> Ok will give the update_cygwin.cmd a shot.
> >>
> >> Thanks,
> >> Curtis.
> >>
> >> > -Andy
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Apr 1, 2014 at 6:28 PM, Curtis <se...@gmail.com> wrote:
> >> >
> >> >> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
> >> >> > Hi All,
> >> >> >
> >> >> > We are having an issue with some of our images where when we try to
> >> >> > create a new image from an existing image, everything goes ok until
> >> >> > the part where the virtual machine is rebooted, and after it's
> >> >> > rebooted sshd does not start up and the imaging process fails.
> >> >> >
> >> >> > Anyone have any thoughts? I'm fairly sure it has something to do with
> >> >> > the various commands that are run on the image once an image creation
> >> >> > process starts.
> >> >>
> >> >> Also, this gist has all the commands that are being run:
> >> >>
> >> >> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
> >> >>
> >> >> But I'm not much of a windows administrator -- does anyone see
> >> >> anything unusual in that gist that might be causing issues? Perhaps
> >> >> something with the root logon or password?
> >> >>
> >> >> >
> >> >> > Thanks,
> >> >> > Curtis.
> >> >> >
> >> >> > --
> >> >> > Twitter: @serverascode
> >> >> > Blog: serverascode.com
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Twitter: @serverascode
> >> >> Blog: serverascode.com
> >> >>
> >>
> >>
> >>
> >> --
> >> Twitter: @serverascode
> >> Blog: serverascode.com
> >>
>
>
>
> --
> Twitter: @serverascode
> Blog: serverascode.com

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Curtis <se...@gmail.com>.
On Wed, Apr 2, 2014 at 11:06 AM, Andy Kurth <an...@ncsu.edu> wrote:
> It looks like ssh on the management node is using a ConnectTimeout value of
> 2 seconds:
> debug3: timeout: 1999 ms remain after connect
>
> Does specifying a longer time make a difference?
> ssh -o ConnectTimeout=10 -vvvv vm79
>

No, doesn't seem to change anything. Though I had set the connect
timeout to 2 only recently because I was testing rebooting virtual
machines and seeing if I could connect via ssh to them after a reboot,
so it was set to whatever the default was before when it first started
breakin.

Below is a session with it set to 10.

root@VCL-PROD:~] $ ssh -vvvv vm79
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /root/.ssh/config
debug1: Applying options for vm*
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to vm79 [10.1.0.195] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 10000 ms remain after connect
debug1: permanently_set_uid: 0/0
debug3: Not a RSA1 key file /etc/vcl/vcl.key.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /etc/vcl/vcl.key type 1
debug1: identity file /etc/vcl/vcl.key-cert type -1
Connection timed out during banner exchange

>
>
> On Wed, Apr 2, 2014 at 11:19 AM, Curtis <se...@gmail.com> wrote:
>
>> Hi Andy,
>>
>> Thanks, inline...
>>
>> On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <an...@ncsu.edu> wrote:
>> > I can't tell from just the commands.  They look normal.  Were there any
>> > WARNING messages during the image process prior to the reboot?
>> >
>> > What error message is reported when you try to ssh from the management
>> > node? (Connection timed out, etc)  It may be helpful if you send the
>> output
>> > from running "ssh -v <win_computer>".
>> >
>>
>> This is what that output looks like:
>>
>> [root@VCL-PROD:~] $ ssh -vvvv vm79
>> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
>> debug1: Reading configuration data /root/.ssh/config
>> debug1: Applying options for vm*
>> debug1: Reading configuration data /etc/ssh/ssh_config
>> debug1: Applying options for *
>> debug2: ssh_connect: needpriv 0
>> debug1: Connecting to vm79 [10.1.0.195] port 22.
>> debug2: fd 3 setting O_NONBLOCK
>> debug1: fd 3 clearing O_NONBLOCK
>> debug1: Connection established.
>> debug3: timeout: 1999 ms remain after connect
>> debug1: permanently_set_uid: 0/0
>> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
>> debug2: key_type_from_name: unknown key type '-----BEGIN'
>> debug3: key_read: missing keytype
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug2: key_type_from_name: unknown key type '-----END'
>> debug3: key_read: missing keytype
>> debug1: identity file /etc/vcl/vcl.key type 1
>> debug1: identity file /etc/vcl/vcl.key-cert type -1
>> Connection timed out during banner exchange
>>
>> > To troubleshoot, you'll need to login as root using the password which
>> was
>> > redacted from the vcld.log output.  Check the following:
>> >
>> > Is the Cygwin SSHD service running?  If not, try to start it.  If you get
>> > an error related to incorrect credentials then something went wrong when
>> > root's password was set early on in the image capture process.
>>
>> It's usually hung up, ie. won't respond to commands.
>>
>> If I login to the vm on its console (with virt-manager) then sshd
>> can't be restarted from the windows service console, or cygrunsrv, but
>> if I kill it with taskill and then start it, it starts up fine.
>>
>> Something to do with long logon times maybe?
>>
>> >
>> > If SSHD is running, it could be a firewall problem.  Try simply turning
>> off
>> > the firewall temporarily on the Windows computer and try to ssh from the
>> > management node.
>>
>> The windows fw is not on, or at least it says it's not on. It's turned
>> off in the image.
>>
>> >
>> > If the firewall isn't the problem, something isn't configured correctly
>> > with the sshd service.  While logged in as root, you can try running
>> > C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run
>> automatically
>> > when an image is loaded and configures sshd correctly and starts the
>> > service.  If running this solves the problem, then you'll have to figure
>> > out which commands or changes made by this script fixed it.  If possibly,
>> > it will be easier to troubleshoot if you take a snapshot of the computer
>> > before running this script so that you can revert to the broken state in
>> > order to narrow down the problem.
>> >
>>
>> Ok will give the update_cygwin.cmd a shot.
>>
>> Thanks,
>> Curtis.
>>
>> > -Andy
>> >
>> >
>> >
>> >
>> > On Tue, Apr 1, 2014 at 6:28 PM, Curtis <se...@gmail.com> wrote:
>> >
>> >> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
>> >> > Hi All,
>> >> >
>> >> > We are having an issue with some of our images where when we try to
>> >> > create a new image from an existing image, everything goes ok until
>> >> > the part where the virtual machine is rebooted, and after it's
>> >> > rebooted sshd does not start up and the imaging process fails.
>> >> >
>> >> > Anyone have any thoughts? I'm fairly sure it has something to do with
>> >> > the various commands that are run on the image once an image creation
>> >> > process starts.
>> >>
>> >> Also, this gist has all the commands that are being run:
>> >>
>> >> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
>> >>
>> >> But I'm not much of a windows administrator -- does anyone see
>> >> anything unusual in that gist that might be causing issues? Perhaps
>> >> something with the root logon or password?
>> >>
>> >> >
>> >> > Thanks,
>> >> > Curtis.
>> >> >
>> >> > --
>> >> > Twitter: @serverascode
>> >> > Blog: serverascode.com
>> >>
>> >>
>> >>
>> >> --
>> >> Twitter: @serverascode
>> >> Blog: serverascode.com
>> >>
>>
>>
>>
>> --
>> Twitter: @serverascode
>> Blog: serverascode.com
>>



-- 
Twitter: @serverascode
Blog: serverascode.com

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Andy Kurth <an...@ncsu.edu>.
It looks like ssh on the management node is using a ConnectTimeout value of
2 seconds:
debug3: timeout: 1999 ms remain after connect

Does specifying a longer time make a difference?
ssh -o ConnectTimeout=10 -vvvv vm79



On Wed, Apr 2, 2014 at 11:19 AM, Curtis <se...@gmail.com> wrote:

> Hi Andy,
>
> Thanks, inline...
>
> On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <an...@ncsu.edu> wrote:
> > I can't tell from just the commands.  They look normal.  Were there any
> > WARNING messages during the image process prior to the reboot?
> >
> > What error message is reported when you try to ssh from the management
> > node? (Connection timed out, etc)  It may be helpful if you send the
> output
> > from running "ssh -v <win_computer>".
> >
>
> This is what that output looks like:
>
> [root@VCL-PROD:~] $ ssh -vvvv vm79
> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
> debug1: Reading configuration data /root/.ssh/config
> debug1: Applying options for vm*
> debug1: Reading configuration data /etc/ssh/ssh_config
> debug1: Applying options for *
> debug2: ssh_connect: needpriv 0
> debug1: Connecting to vm79 [10.1.0.195] port 22.
> debug2: fd 3 setting O_NONBLOCK
> debug1: fd 3 clearing O_NONBLOCK
> debug1: Connection established.
> debug3: timeout: 1999 ms remain after connect
> debug1: permanently_set_uid: 0/0
> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
> debug2: key_type_from_name: unknown key type '-----BEGIN'
> debug3: key_read: missing keytype
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug3: key_read: missing whitespace
> debug2: key_type_from_name: unknown key type '-----END'
> debug3: key_read: missing keytype
> debug1: identity file /etc/vcl/vcl.key type 1
> debug1: identity file /etc/vcl/vcl.key-cert type -1
> Connection timed out during banner exchange
>
> > To troubleshoot, you'll need to login as root using the password which
> was
> > redacted from the vcld.log output.  Check the following:
> >
> > Is the Cygwin SSHD service running?  If not, try to start it.  If you get
> > an error related to incorrect credentials then something went wrong when
> > root's password was set early on in the image capture process.
>
> It's usually hung up, ie. won't respond to commands.
>
> If I login to the vm on its console (with virt-manager) then sshd
> can't be restarted from the windows service console, or cygrunsrv, but
> if I kill it with taskill and then start it, it starts up fine.
>
> Something to do with long logon times maybe?
>
> >
> > If SSHD is running, it could be a firewall problem.  Try simply turning
> off
> > the firewall temporarily on the Windows computer and try to ssh from the
> > management node.
>
> The windows fw is not on, or at least it says it's not on. It's turned
> off in the image.
>
> >
> > If the firewall isn't the problem, something isn't configured correctly
> > with the sshd service.  While logged in as root, you can try running
> > C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run
> automatically
> > when an image is loaded and configures sshd correctly and starts the
> > service.  If running this solves the problem, then you'll have to figure
> > out which commands or changes made by this script fixed it.  If possibly,
> > it will be easier to troubleshoot if you take a snapshot of the computer
> > before running this script so that you can revert to the broken state in
> > order to narrow down the problem.
> >
>
> Ok will give the update_cygwin.cmd a shot.
>
> Thanks,
> Curtis.
>
> > -Andy
> >
> >
> >
> >
> > On Tue, Apr 1, 2014 at 6:28 PM, Curtis <se...@gmail.com> wrote:
> >
> >> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
> >> > Hi All,
> >> >
> >> > We are having an issue with some of our images where when we try to
> >> > create a new image from an existing image, everything goes ok until
> >> > the part where the virtual machine is rebooted, and after it's
> >> > rebooted sshd does not start up and the imaging process fails.
> >> >
> >> > Anyone have any thoughts? I'm fairly sure it has something to do with
> >> > the various commands that are run on the image once an image creation
> >> > process starts.
> >>
> >> Also, this gist has all the commands that are being run:
> >>
> >> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
> >>
> >> But I'm not much of a windows administrator -- does anyone see
> >> anything unusual in that gist that might be causing issues? Perhaps
> >> something with the root logon or password?
> >>
> >> >
> >> > Thanks,
> >> > Curtis.
> >> >
> >> > --
> >> > Twitter: @serverascode
> >> > Blog: serverascode.com
> >>
> >>
> >>
> >> --
> >> Twitter: @serverascode
> >> Blog: serverascode.com
> >>
>
>
>
> --
> Twitter: @serverascode
> Blog: serverascode.com
>

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Curtis <se...@gmail.com>.
Hi Andy,

Thanks, inline...

On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <an...@ncsu.edu> wrote:
> I can't tell from just the commands.  They look normal.  Were there any
> WARNING messages during the image process prior to the reboot?
>
> What error message is reported when you try to ssh from the management
> node? (Connection timed out, etc)  It may be helpful if you send the output
> from running "ssh -v <win_computer>".
>

This is what that output looks like:

[root@VCL-PROD:~] $ ssh -vvvv vm79
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /root/.ssh/config
debug1: Applying options for vm*
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to vm79 [10.1.0.195] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 1999 ms remain after connect
debug1: permanently_set_uid: 0/0
debug3: Not a RSA1 key file /etc/vcl/vcl.key.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /etc/vcl/vcl.key type 1
debug1: identity file /etc/vcl/vcl.key-cert type -1
Connection timed out during banner exchange

> To troubleshoot, you'll need to login as root using the password which was
> redacted from the vcld.log output.  Check the following:
>
> Is the Cygwin SSHD service running?  If not, try to start it.  If you get
> an error related to incorrect credentials then something went wrong when
> root's password was set early on in the image capture process.

It's usually hung up, ie. won't respond to commands.

If I login to the vm on its console (with virt-manager) then sshd
can't be restarted from the windows service console, or cygrunsrv, but
if I kill it with taskill and then start it, it starts up fine.

Something to do with long logon times maybe?

>
> If SSHD is running, it could be a firewall problem.  Try simply turning off
> the firewall temporarily on the Windows computer and try to ssh from the
> management node.

The windows fw is not on, or at least it says it's not on. It's turned
off in the image.

>
> If the firewall isn't the problem, something isn't configured correctly
> with the sshd service.  While logged in as root, you can try running
> C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run automatically
> when an image is loaded and configures sshd correctly and starts the
> service.  If running this solves the problem, then you'll have to figure
> out which commands or changes made by this script fixed it.  If possibly,
> it will be easier to troubleshoot if you take a snapshot of the computer
> before running this script so that you can revert to the broken state in
> order to narrow down the problem.
>

Ok will give the update_cygwin.cmd a shot.

Thanks,
Curtis.

> -Andy
>
>
>
>
> On Tue, Apr 1, 2014 at 6:28 PM, Curtis <se...@gmail.com> wrote:
>
>> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
>> > Hi All,
>> >
>> > We are having an issue with some of our images where when we try to
>> > create a new image from an existing image, everything goes ok until
>> > the part where the virtual machine is rebooted, and after it's
>> > rebooted sshd does not start up and the imaging process fails.
>> >
>> > Anyone have any thoughts? I'm fairly sure it has something to do with
>> > the various commands that are run on the image once an image creation
>> > process starts.
>>
>> Also, this gist has all the commands that are being run:
>>
>> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
>>
>> But I'm not much of a windows administrator -- does anyone see
>> anything unusual in that gist that might be causing issues? Perhaps
>> something with the root logon or password?
>>
>> >
>> > Thanks,
>> > Curtis.
>> >
>> > --
>> > Twitter: @serverascode
>> > Blog: serverascode.com
>>
>>
>>
>> --
>> Twitter: @serverascode
>> Blog: serverascode.com
>>



-- 
Twitter: @serverascode
Blog: serverascode.com

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Andy Kurth <an...@ncsu.edu>.
I can't tell from just the commands.  They look normal.  Were there any
WARNING messages during the image process prior to the reboot?

What error message is reported when you try to ssh from the management
node? (Connection timed out, etc)  It may be helpful if you send the output
from running "ssh -v <win_computer>".

To troubleshoot, you'll need to login as root using the password which was
redacted from the vcld.log output.  Check the following:

Is the Cygwin SSHD service running?  If not, try to start it.  If you get
an error related to incorrect credentials then something went wrong when
root's password was set early on in the image capture process.

If SSHD is running, it could be a firewall problem.  Try simply turning off
the firewall temporarily on the Windows computer and try to ssh from the
management node.

If the firewall isn't the problem, something isn't configured correctly
with the sshd service.  While logged in as root, you can try running
C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run automatically
when an image is loaded and configures sshd correctly and starts the
service.  If running this solves the problem, then you'll have to figure
out which commands or changes made by this script fixed it.  If possibly,
it will be easier to troubleshoot if you take a snapshot of the computer
before running this script so that you can revert to the broken state in
order to narrow down the problem.

-Andy




On Tue, Apr 1, 2014 at 6:28 PM, Curtis <se...@gmail.com> wrote:

> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
> > Hi All,
> >
> > We are having an issue with some of our images where when we try to
> > create a new image from an existing image, everything goes ok until
> > the part where the virtual machine is rebooted, and after it's
> > rebooted sshd does not start up and the imaging process fails.
> >
> > Anyone have any thoughts? I'm fairly sure it has something to do with
> > the various commands that are run on the image once an image creation
> > process starts.
>
> Also, this gist has all the commands that are being run:
>
> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
>
> But I'm not much of a windows administrator -- does anyone see
> anything unusual in that gist that might be causing issues? Perhaps
> something with the root logon or password?
>
> >
> > Thanks,
> > Curtis.
> >
> > --
> > Twitter: @serverascode
> > Blog: serverascode.com
>
>
>
> --
> Twitter: @serverascode
> Blog: serverascode.com
>

Re: sshd doesn't start after new image creation process reboots, imaging fails

Posted by Curtis <se...@gmail.com>.
On Tue, Apr 1, 2014 at 4:16 PM, Curtis <se...@gmail.com> wrote:
> Hi All,
>
> We are having an issue with some of our images where when we try to
> create a new image from an existing image, everything goes ok until
> the part where the virtual machine is rebooted, and after it's
> rebooted sshd does not start up and the imaging process fails.
>
> Anyone have any thoughts? I'm fairly sure it has something to do with
> the various commands that are run on the image once an image creation
> process starts.

Also, this gist has all the commands that are being run:

https://gist.github.com/curtisgithub/6117a73b47e994d9be03

But I'm not much of a windows administrator -- does anyone see
anything unusual in that gist that might be causing issues? Perhaps
something with the root logon or password?

>
> Thanks,
> Curtis.
>
> --
> Twitter: @serverascode
> Blog: serverascode.com



-- 
Twitter: @serverascode
Blog: serverascode.com