You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@vcl.apache.org by Andy Kurth <an...@ncsu.edu> on 2012/09/14 16:22:03 UTC

Re: New capture failed attempting Windows post-load tasks

Yes.  During image capture, all of the files under C:\cygwin\home\root\VCL
get deleted and then the the files from /usr/local/vcl/tools on the
management node are copied.  The captured image should always get the
version on the management node.  It sounds like you experienced otherwise.
 You should see this process happening in the vcld.log output on lines
containing 'copy_capture_configuration_files'.

On Fri, Aug 17, 2012 at 12:52 PM, Hechler, Adam <he...@rpi.edu> wrote:

> Many thanks Andy,
>
> It appears that this was our problem. I managed to get one image up to
> date and I'm working with it now.
>
> Although, I created an image after we updated the file on the management
> node and I still had the problems with this new image. Shouldn't the image
> have gotten the file from the management node and therefore should have had
> the newer version? I'm curious why it was having problems. I'm trying to
> learn how this all works, but my history is mostly as a PC guy and not a
> unix guy. It seems like  you really need to know some unix to start
> understanding how this all work.
> But hey, it's all working now.
>
> Thanks,
> Adam
>
> > -----Original Message-----
> > From: Andy Kurth [mailto:andy_kurth@ncsu.edu]
> > Sent: Monday, August 13, 2012 1:59 PM
> > To: user@vcl.apache.org
> > Subject: Re: New capture failed attempting Windows post-load tasks
> >
> > A few minor changes when updating the file on the computer being loaded:
> >
> > Change https to http or else wget may fail:
> > * wget
> > http://svn.apache.org/repos/asf/vcl/trunk/managementnode/tools/Windo
> > ws/Scripts/update_cygwin.cmd
> > * Set the file to executable after you download it: chmod +x
> > update_cygwin.cmd
> > * Manually run update_cygwin.cmd: ./update_cygwin.cmd
> > * After running update_cygwin.cmd, log off as root.
> >
> > -Andy
> >
> > On Mon, Aug 13, 2012 at 1:49 PM, Andy Kurth <an...@ncsu.edu>
> > wrote:
> > > I believe there is a bug in the latest version of Cygwin which is
> > > causing update_cygwin.cmd to fail.  As a result, the computer being
> > > loaded never responds to SSH.  When ssh-keygen.exe is run from a
> > > normal non-Cygwin command prompt, the following occurs:
> > >
> > > C:\cygwin\home\root\VCL\Scripts>C:\Cygwin\bin\ssh-keygen.exe -t rsa1
> > > -f C:\cygwin\etc\ssh_host_key -N ""
> > > Generating public/private rsa1 key pair.
> > >       8 [main] ssh-keygen 224 exception::handle: Exception:
> > > STATUS_ACCESS_VIOLATION
> > >    2114 [main] ssh-keygen 224 open_stackdumpfile: Dumping stack trace
> > > to ssh-keygen.exe.stackdump
> > >   61325 [main] ssh-keygen 224 exception::handle: Exception:
> > > STATUS_ACCESS_VIOLATION
> > >   68272 [main] ssh-keygen 224 exception::handle: Error while dumping
> > > state (probably corrupted stack)
> > >
> > > Running rebaseall doesn't help. The command succeeds if run from a
> > > Cygwin shell.  I just committed an update to update_cygwin.cmd to wrap
> > > the ssh-keygen.exe commands in "bash.exe -c".
> > > (https://issues.apache.org/jira/browse/VCL-616)
> > >
> > > You're going to have to update the file on the management node and in
> > > any images which were captured but aren't loading:
> > >
> > > On the management node:
> > > * cd /usr/local/vcl/tools/Windows/Scripts
> > > * rm -f update_cygwin.cmd
> > > * wget
> > https://svn.apache.org/repos/asf/vcl/trunk/managementnode/tools/Windo
> > ws/Scripts/update_cygwin.cmd
> > >
> > > For images which aren't loading correctly, update_cygwin.cmd will need
> > > to be updated within the image and then a new revision of the VCL
> > > image must be created.
> > >
> > > * Make an imaging reservation for the problematic image.
> > > * Watch the console as the image is being loaded.  Assuming you're
> > > using ESXi, view the Console tab from the vSphere Client.  You should
> > > see the VM being powered on, the root account automatically logs in,
> > > runs a few scripts, and then logs off.
> > > * After root is automatically logged off, manually log in as root.
> > > The password will be the value of WINDOWS_ROOT_PASSWORD
> > configured in
> > > /etc/vcl/vcld.conf.
> > > * Once logged in as root, open the Cygwin shell.
> > > * cd ~/VCL/Scripts
> > > * rm -f update_cygwin.cmd
> > > * wget
> > https://svn.apache.org/repos/asf/vcl/trunk/managementnode/tools/Windo
> > ws/Scripts/update_cygwin.cmd
> > > * Manually run update_cygwin.cmd: ./update_cygwin.cmd
> > >
> > > The vcld process should still be running and waiting for the computer
> > > to respond to SSH (you have 900 seconds).  When you run
> > > update_cygwin.cmd, the computer should begin responding and the
> > > reservation should finish loading.  You should be able to log in
> > > normally from the information on the Current Reservations page.  Save
> > > a new revision of the image.  It should be saved with the updated copy
> > > of update_cygwin.cmd which was downloaded to the management node.
> > >
> > > -Andy
> > >
> > >
> > > On Fri, Aug 3, 2012 at 12:25 PM, Basilio, Norvin <nb...@odu.edu>
> wrote:
> > >> I am also experiencing this issue when using Cygwin 1.7. I've run the
> > "update_cygwin.cmd" manually and saw that its unable to regenerate the
> > keys. I decided to try and capture my image using the older Cygwin 1.5
> and
> > the update_cygwin.cmd was able to regenerate the keys correctly allowing
> > the reload process to complete.
> > >>
> > >> Norvin Basilio
> > >> nbasilio@odu.edu
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: Hechler, Adam [mailto:hechla@rpi.edu]
> > >> Sent: Friday, August 03, 2012 12:14 PM
> > >> To: user@vcl.apache.org
> > >> Subject: RE: New capture failed atttempting Windows post-load tasks
> > >>
> > >> Hello again,
> > >>
> > >> So, I walked out of the office last night thinking that my re-capture
> was
> > running smoothly. It got about 20 minutes in, I think and then (I think
> this is
> > the section containing the fatal error - it failed to configure the
> firewall to all
> > SSH?). Is that my problem?  If so, any idea how to correct that?
> > >>
> > >> Thanks,
> > >> Adam
> > >>
> > >> ----
> > >>
> > >> 2012-08-02
> > 17:39:37|24852|125:125|image|Windows.pm:firewall_enable_ssh_private(4
> > 633)|SSH will be enabled on private interface: Local Area Connection 3
> > >> 2012-08-02
> > 17:39:37|24852|125:125|image|utils.pm:run_ssh_command(5380)|executin
> > g SSH command on vmwg0-120-57:
> > >> |24852|125:125|image| /usr/bin/ssh -i /etc/vcl/vcl.key  -o
> > >> |24852|StrictHostKeyChecking=no -l root -p 22 -x vmwg0-120-57
> > >> |24852|'C:/Windows/System32/netsh.exe firewall delete portopening
> > >> |24852|protocol = TCP port = 22 interface = "Local Area Connection 4"
> > >> |24852|;C:/Windows/System32/netsh.exe firewall delete portopening
> > >> |24852|protocol = TCP port = 22 profile = ALL
> > >> |24852|;C:/Windows/System32/netsh.exe firewall set portopening name
> > =
> > >> |24852|"Cygwin SSHD" protocol = TCP port = 22 mode = ENABLE interface
> > =
> > >> |24852|"Local Area Connection 3"' 2>&1
> > >> 2012-08-02
> > 17:39:42|24852|125:125|image|utils.pm:run_ssh_command(5464)|run_ssh_
> > command output:
> > >> |24852|125:125|image| The interface was not found.
> > >> |24852|125:125|image| Ok.
> > >> |24852|125:125|image| The interface was not found.
> > >> 2012-08-02
> > 17:39:42|24852|125:125|image|utils.pm:run_ssh_command(5474)|SSH
> > command executed on vmwg0-120-57, command:
> > >> |24852|125:125|image| /usr/bin/ssh -i /etc/vcl/vcl.key  -o
> > >> |24852|StrictHostKeyChecking=no -l root -p 22 -x vmwg0-120-57
> > >> |24852|'C:/Windows/System32/netsh.exe firewall delete portopening
> > >> |24852|protocol = TCP port = 22 interface = "Local Area Connection 4"
> > ;C:/Windows/System32/netsh.exe firewall delete portopening protocol =
> > TCP port = 22 profile = ALL ;C:/Windows/System32/netsh.exe firewall set
> > portopening name = "Cygwin SSHD" protocol = TCP port = 22 mode = ENABLE
> > interface = "Local Area Connection 3"' 2>&1 125:125|image| returning (1,
> > "The interface was not found. O...") 125:125|image| ---- WARNING ----
> > 125:125|image| 2012-08-02
> > 17:39:42|24852|125:125|image|Windows.pm:firewall_enable_ssh_private(4
> > 665)|failed to configure firewall to allow SSH on private interface,
> exit status:
> > 1, output:
> > >> |24852|125:125|image| The interface was not found. Ok. The interface
> > was not found.
> > >> |24852|125:125|image| ( 0) Windows.pm, firewall_enable_ssh_private
> > >> |24852|(line: 4665) 125:125|image| (-1) Windows.pm, reboot (line:
> 3335)
> > >> |24852|125:125|image| (-2) Windows.pm, disable_pagefile (line: 2077)
> > >> |24852|125:125|image| (-3) Windows.pm, pre_capture (line: 474)
> > >> |24852|125:125|image| (-4) Version_5.pm, pre_capture (line: 105)
> > >> |24852|125:125|image| (-5) VMware.pm, capture (line: 556)
> > 125:125|image|
> > >> |24852|---- WARNING ---- 125:125|image| 2012-08-02
> > >>
> > |24852|17:39:42|24852|125:125|image|Windows.pm:reboot(3336)|reboot
> > not
> > >> |24852|attempted, failed to enable ssh from private IP addresses
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Hechler, Adam [mailto:hechla@rpi.edu]
> > >>> Sent: Thursday, August 02, 2012 4:38 PM
> > >>> To: user@vcl.apache.org
> > >>> Subject: RE: New capture failed atttempting Windows post-load tasks
> > >>>
> > >>> Thanks Dmitri,
> > >>>
> > >>> I was able to ssh to the vm from the management node before I
> > captured.
> > >>>
> > >>> Curious.. because I never thought about it before... I can re-capture
> > >>> an existing vm that's already been captured? I guess it makes logical
> > >>> sense. It's still just a vm existing in VMWare Server.
> > >>>
> > >>> I'll give that a try.
> > >>>
> > >>> Adam
> > >>>
> > >>> > -----Original Message-----
> > >>> > From: dchebota@gmu.edu [mailto:dchebota@gmu.edu]
> > >>> > Sent: Thursday, August 02, 2012 4:35 PM
> > >>> > To: user@vcl.apache.org
> > >>> > Subject: Re: New capture failed atttempting Windows post-load tasks
> > >>> >
> > >>> > Adam
> > >>> >
> > >>> > Where you able to 'ssh -i /etc/vcl/vcl.key image-computer-name'
> > >>> > before
> > >>> you
> > >>> > captured the image?
> > >>> >
> > >>> > Yes, it seems like a good idea to redo ssh config, run
> > >>> > get-node-key.sh from management node and re-capture the image.
> > >>> > You will have new image under Manage Images and can delete the old
> > >>> image
> > >>> > which is not working.
> > >>> >
> > >>> > Reboot the image before you start capture to make sure Cygwin SSH
> > >>> > starts up.
> > >>> >
> > >>> > Thanks
> > >>> >
> > >>> >
> > >>> > On Aug 2, 2012, at 16:18 , "Hechler, Adam" <he...@rpi.edu> wrote:
> > >>> >
> > >>> > > Hi Dmitri,
> > >>> > >
> > >>> > > I tried that and it's not working. I even went into Cygwin and
> > >>> > > tried to
> > >>> > manually start sshd from in there and it's giving me the following
> > >>> > error
> > >>> > messages:
> > >>> > >
> > >>> > > Could not load host key: /etc/ssh_host_rsa_key Could not load
> host
> > >>> > > key: /etc/ssh_host_dsa_key Could not load host key:
> > >>> > > /etc/ssh_host_ecdsa_key Disabling protocol version 2. Could not
> > >>> > > load host key
> > >>> > > sshd: no hostkeys available -- exiting.
> > >>> > >
> > >>> > > When I check in etc, there are files for the host keys but
> they're
> > >>> > > empty
> > >>> > now.  When I check the sshd log there's a bunch of entries showing
> > >>> > that it matched host keys and then three sets of "no host keys
> > >>> > available" at the bottom of the log (presumably from my last three
> > >>> > attempts to start sshd beginning with the reload).
> > >>> > >
> > >>> > > Can I just run the cywin-sshd-config.sh again on the vm and then
> > >>> > > run the
> > >>> > gen-node-key again on the management node?  It's already been
> > >>> > captured so I'm not sure if that would cause havoc.
> > >>> > >
> > >>> > > Adam
> > >>> > >
> > >>> > >
> > >>> > >> -----Original Message-----
> > >>> > >> From: dchebota@gmu.edu [mailto:dchebota@gmu.edu]
> > >>> > >> Sent: Thursday, August 02, 2012 4:03 PM
> > >>> > >> To: user@vcl.apache.org
> > >>> > >> Subject: Re: New capture failed atttempting Windows post-load
> > >>> > >> tasks
> > >>> > >>
> > >>> > >> Hi Adam
> > >>> > >>
> > >>> > >> Once you connect to Windows XP using VI client, can you start
> > >>> > >> Cygwin
> > >>> SSH
> > >>> > >> service manually under Control Panel -> Services?
> > >>> > >>
> > >>> > >> Thanks.
> > >>> > >> On Aug 2, 2012, at 15:34 , "Hechler, Adam" <he...@rpi.edu>
> > wrote:
> > >>> > >>
> > >>> > >>> Hi again,
> > >>> > >>>
> > >>> > >>> So after getting the new sshd-config file this morning, I
> > >>> > >>> configured it
> > >>> and
> > >>> > all
> > >>> > >> seemed good. I then attempted to capture my base image. The
> > >>> > >> capture
> > >>> > itself
> > >>> > >> completed successfully but then I got an error that the reload
> > >>> > >> process
> > >>> > failed
> > >>> > >> right after this:
> > >>> > >>>
> > >>> > >>> 2012-08-02
> > >>> > >>
> > >>> 12:23:18|21124|124:124|reload|Windows.pm:post_load(583)|beginning
> > >>> > >> Windows post-load tasks on vmwg0-120-57
> > >>> > >>>
> > >>> > >>> After numerous attempts (about 107) to connect to SSH it
> finally
> > >>> > >>> failed
> > >>> > >> reporting:
> > >>> > >>>
> > >>> > >>> 2012-08-02
> > >>> > >>
> > >>> >
> > >>>
> > 12:38:35|21124|124:124|reload|Module.pm:code_loop_timeout(767)|waiti
> > >>> > >> ng for vmwg0-120-57 to respond to SSH, code did not return true
> > >>> > >> after waiting 900 seconds
> > >>> > >>>
> > >>> > >>> Since it didn't finish the post-load tasks I was still able to
> > >>> > >>> login as root to
> > >>> > my
> > >>> > >> Windows XP image using the VI Client console. I opened Cygwin
> > and
> > >>> > typed ps
> > >>> > >> -ef looking to see if sshd was running but it's not. The only
> > >>> > >> processes
> > >>> > running
> > >>> > >> are ps, bash and mintty. Should I be able to see if sshd is
> > >>> > >> running using
> > >>> this
> > >>> > >> method of checking. I know about ps -ef from very limited unix
> > >>> > interactions
> > >>> > >> so I thought I'd try it.
> > >>> > >>>
> > >>> > >>> I know that in the past, when sshd didn't start (before
> > >>> > >>> capturing into
> > >>> > VCL) I
> > >>> > >> would have to open a cmd prompt and run the rebaseall but it
> > >>> > >> looks like
> > >>> > that
> > >>> > >> cmd file gets deleted during the capture? because it's no longer
> > >>> > >> in C:\cygwin\home\root which is where it used to be. I was
> > >>> > >> thinking I
> > >>> would
> > >>> > >> just try to run that again.
> > >>> > >>>
> > >>> > >>> Any clues?
> > >>> > >>>
> > >>> > >>> Thanks,
> > >>> > >>> Adam
> > >>> > >>>
> > >>> > >>>
> > >>> > >>>
> > >>> > >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > >>> > >>> Adam Hechler Senior Analyst /PC Systems Administrator
> > Rensselaer
> > >>> > >>> Polytechnic Institute
> > >>> > >>> 275 Windsor Street
> > >>> > >>> Hartford, CT 06120 USA
> > >>> > >>> Ph: 860-548-2446
> > >>> > >>> Email: hechla@rpi.edu
> > >>> > >>> Web: http://www.ewp.rpi.edu
> > >>> > >>> <image001.jpg> <image002.jpg> <image003.jpg>  <image004.png>
> > >>> > >>>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> --
> > >>> > >> Thank you,
> > >>> > >>
> > >>> > >> Dmitri Chebotarov
> > >>> > >> Virtual Computing Lab Systems Engineer, TSD - Ent Servers &
> > >>> > >> Messaging
> > >>> > >> 223 Aquia Building, Ffx, MSN: 1B5
> > >>> > >> Phone: (703) 993-6175
> > >>> > >> Fax: (703) 993-3404
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > Thank you,
> > >>> >
> > >>> > Dmitri Chebotarov
> > >>> > Virtual Computing Lab Systems Engineer, TSD - Ent Servers &
> > >>> > Messaging
> > >>> > 223 Aquia Building, Ffx, MSN: 1B5
> > >>> > Phone: (703) 993-6175
> > >>> > Fax: (703) 993-3404
> > >>> >
> > >>> >
> > >>> >
> > >>
> > >>
> > >>
> > >> --
> > >>
>