You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vcl.apache.org by Andy Kurth <an...@ncsu.edu> on 2010/03/02 16:31:01 UTC

Re: Base image capture failure

You will need to watch the VM console after the VM is turned on in order to 
troubleshoot this.  You should see the following:

-VM is turned on
-Sysprep minisetup runs, VM is rebooted
-When Windows boots up for the first time, the root account is automatically 
logged on
-A few black command boxes appear on the desktop, the one in the back is named 
post_load.cmd
-When the command boxes close, root is logged off
-At this point, the computer should respond to SSH


You should be able to log on as root via the VMware console.  The password 
should be the one configured as WINDOWS_ROOT_PASSWORD /etc/vcl/vcld.conf.  After 
logging in, view the log files generated by the VCL scripts.  All of the output 
generated by the scripts gets saved into files in C:\cygwin\home\root\VCL\Logs.

The troubleshooting steps depend largely on whether or not you see root being 
automatically logged on.

If root is not logged on automatically, the problem can probably be found in 
sysprep_cmdlines.log and the files in Logs\sysprep_cmdlines directory.  These 
files are generated during the Sysprep minisetup stage when 
Scripts\sysprep_cmdlines.cmd runs.  This script configures root's autologon and 
sets a registry key to cause Scripts\post_load.cmd to run after root is 
automatically logged on.

If it's attempting to log on root but failing because of a credentials problem, 
the cause could be that the password was not correctly configured in 
Scripts\autologon_enable.cmd.  Check the "set PASSWORD=" line in this file.

If root is being logged on, first check if the Cygwin SSHD service is running 
and if the firewall has an exception for TCP port 22.  Be sure to check both the 
middle "Exceptions" tab and the settings for each adapter under the "Advanced" 
tab for the exception.  My guess is that SSHD failed to start.  The problem can 
probably be found in Logs\post_load.log and in the files in the Logs\post_load 
directory.  Check Logs\update_cygwin.cmd for errors.

As you'll see in the log files, there's a lot that has to happen in order for 
everything to work correctly.  The output from the log files will be helpful in 
order to figure this out.

Regards,
Andy

On 2/17/2010 7:22 PM, Terry McGuire wrote:
> Well, I think the base image is officially captured, but I don't seem to be able to quite make it work.  I've repeated the capture a few times and always end up in a situation where, when I make a reservation for the image, the image loads on the VM and various other useful-looking things happen, but ends before the reservation is made available to me with this error:
>
> ______________________
> 2010-02-17 17:01:23|16589|3:8|new|vmware.pm:load(848)|vmguest-1 ROUND 1 checks loop 19 of 40
> 2010-02-17 17:01:23|16589|3:8|new|utils.pm:run_ssh_command(6180)|executing SSH command on localvmhost:
> |16589|3:8|new| /usr/bin/ssh -i /etc/vcl/vcl.key  -l root -p 22 -x localvmhost 'vmware-cmd /var/lib/vmware/Virtual\ Machines/vmwarewinxp-base7-v0vmguest-1/vmwarewinxp-base7-v0vmguest-1.vmx getstate' 2>&1
> 2010-02-17 17:01:24|16589|3:8|new|utils.pm:run_ssh_command(6262)|run_ssh_command output:
> |16589|3:8|new| getstate() = on
> 2010-02-17 17:01:24|16589|3:8|new|utils.pm:run_ssh_command(6276)|SSH command executed on localvmhost, returning (0, "getstate() = on")
> 2010-02-17 17:01:24|16589|3:8|new|vmware.pm:load(852)|rechecking state of vm vmguest-1 /var/lib/vmware/Virtual\ Machines/vmwarewinxp-base7-v0vmguest-1/vmwarewinxp-base7-v0vmguest-1.vmx
> 2010-02-17 17:01:24|16589|3:8|new|vmware.pm:load(857)|vm vmguest-1 reports on
> 2010-02-17 17:01:24|16589|3:8|new|vmware.pm:load(868)|sshd is NOT active on vmguest-1 yet
> ____________________
>
> It tries for a long time to ssh into the machine, but doesn't succeed.  I see in the vmware server console that the vm is up and running, but it can't be sshed into.  When I try it from the management node's command line, I get "connection refused".  Obviously, it *was* working, so I guess something went screwy in the capture process, yes?  But, well, I haven't been able to figure out what.  Thus, yet another message out to you.
>
> Ideas?
>
> Terry
>
> On 10 Feb 2010, at 0940h, Andy Kurth wrote:
>
>> It looks like the image capture was successful and the vmware.pm module had trouble changing the file names to the new image name.  I don't think it was the result of renaming the VM directory.  You had the right idea by changing it to match the reservation ID.  I think the problem has to do with the original names of the .vmdk files which were named after the manually created VM.  What are the contents of /install/vmware_images/vmwarewinxp-base7-v0/?
>>
>> At this point I would manually fix the captured VM files.  The .vmdk files should be named vmwarewinxp-base7-v0-s00x.vmdk.  Rename all of the .vmdk files in the /install/vmware_images/vmwarewinxp-base7-v0/ directory to match this format.  Change the first part of the names but keep the 's00x.vmdk' as they are named now.
>>
>> There should be one .vmdk file without the 's00x' part.  This should now be named vmwarewinxp-base7-v0.vmdk.  This file needs to be edited because it contains the names of the other .vmdk files.  You should see an "Extent description" section in the file with the original names.  Change each lines to include 'vmwarewinxp-base7-v0-x00x.vmdk' instead of the old name.
>>
>> Next, make sure the VCL 'deleted' column in the image and imagerevision tables for this image is set to 0.  In the image table, check id=7.  You'll have to look at the imagerevision table to figure out which one is for this revision. The imagerevision.imagename value will be vmwarewinxp-base7-v0.
>>
>> Next, make sure there isn't a directory named '/var/lib/vmware/Virtual Machines/vmwarewinxp-base7-v0'.  There shouldn't be one but check to make sure.  If it exists, rename it for now.
>>
>> Next, cross your fingers and try to make a reservation for this image.  If you created and configured multiple VMs in VCL then another one should already be in the available state and you should be able to make a reservation.  If not, change the state of your VM to 'available' via Manage Computers.
>>
>> If you have trouble, the following will be useful:
>> $ ls -l /install/vmware_images
>> $ ls -l /install/vmware_images/vmwarewinxp-base7-v0
>> $ ls -l /var/lib/vmware/Virtual\ Machines/
>> $ cat /install/vmware_images/vmwarewinxp-base7-v0/vmwarewinxp-base7-v0.vmdk
>>
>> I'm thinking there's a problem with the instructions that caused this latest problem.  I'll go through them.  Stating the obvious, but we obviously need a much better way to create base image reservations.
>

Re: Only one vm working

Posted by Terry McGuire <tm...@ualberta.ca>.
Hi Andy.  A breakthrough!  I now have multiple vm guests working!  Yay!  Weirdly, I still don't have the *first* vm guest - vmguest-1 - working, but at this point I don't really care, as this is all just for evaluation purposes anyway.  The details, for the record:

The problem with vmguest-3, and what caused the new error in the vcld log:
____________

Failed to resolve given hostname/IP: vmguest-3.  Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
WARNING: No targets were specified, so 0 hosts scanned.
___________

...was that I had indeed forgotten to add vmguest-3 to /etc/hosts.  Adding it made the above error go away and, miraculously, also made ssh successful.

It remains a mystery why vmguest-1 doesn't work, but since I now have multiple functioning vmguests I'm moving on to other challenges, namely, creating more images, both Windows and Linux, and adding users to the system.  A thousand thanks for your perseverance here Andy.  Hopefully, I'll be able to deal with the remaining challenges with a little less of your time and patience.

Regards,
Terry


Re: Only one vm working

Posted by Andy Kurth <an...@ncsu.edu>.
Hi Terry,
Is the image booting on vmguest-2 and 3 but SSH is failing, or is it not booting 
at all?  If it isn't booting, check the computer.drivetype values for the VMs. 
I came across an issue with this last week with another pilot attempting to 
create a Linux VMware base image.  If the value differs among the VMs, try 
swapping sda/hda and see what happens.

If the image is booting but SSH isn't responding, check the MAC addresses and IP 
addresses that are assigned to the VMs.  If it isn't receiving an IP address, do 
the private MAC addresses match dhcpd.conf and /etc/hosts?  Also, check the VM 
host to make sure you don't have multiple instances of a VM using the same MAC 
address.

I'm not sure what's causing the "Failed to resolve given hostname" error.  I'm 
guessing this is coming from the nmap command.  Was this error listed in 
vcld.log or did you see it somewhere else?  Please provide some lines leading up 
to this error if it's from the log.

Regards,
Andy

Terry McGuire wrote:
> Hi Andy (and anyone else following along here).  I've been doing a lot of poking around, and, long story short,  I can now (for the first time ever) successfully book and log into the Windows image (yay!) but, annoyingly, only with a single one of the vm guest computers I've configured.
> 
> While stumbling around in the dark, I decided to try setting up a Linux base image as well as the Windows one.  The process went much quicker, but, unfortunately, it seems to be getting hung up in a similar place to the Windows image, but that's not the interesting thing.  When I created the Linux image, I created a new vm guest to run it on ("vmguest-2").  When I got tired of playing with the Linux image, I switched back to the Windows image, and, to my amazement, it worked!  And then I realized that it was loading on vmguest-2.  Still didn't work on vmguest-1.  I created yet another vm - vmguest-3 - but it also won't work on it.  Only vmguest-2.  I can't quite figure out what's special about it.  I even swapped the private ip addresses, so vmguest-1 had vmguest-2's address, same result.  (And, with the wiki down at the moment, I can't get to the Linux base image documentation to see if there was something special about how I made the vm in the first place.)
> 
> As well, the errors I get are different on vmguest-1 and 3.  On 1, it can't ssh into the machine, as before.  On 3, it starts giving me these:
> ____________
> 
> Failed to resolve given hostname/IP: vmguest-3.  Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
> WARNING: No targets were specified, so 0 hosts scanned.
> ____________
> 
> To my newbie eyes, all three vm computers are all as identically configured in the vcl computers tables as possible under the circumstances.
> 
> Another thing (though probably not related):  The machines all come up with 512MB memory, but I've set them to have 1024MB.  Clearly, I'm missing some config info somewhere.  
> 
> At this point it seems I have a useful situation for continued debugging:  a working setup, but only for the Windows image, and only for a single VM.  There's *gotta* be a way to figure out what's the difference making the difference.  I'm not worrying about the Linux image right now.  I figure, once I get Windows images running properly, I'll have a much easier time getting Linux working.
> 
> On a (related) side note, I see the list is getting much busier with newbies like me asking newbie questions.  A mixed blessing?  Obvious interest in the product, but a whole lot of support work for you, huh?  Once I actually have a clue, I fully intend to start contributing back, to help with this situation.
> 
> Terry
> 
> On 7 Apr 2010, at 1418h, Andy Kurth wrote:
> 
>> Is SSH working and is everything being processed by vcld to the point where you see the Connect button on the web page?  If you are just manually running the scripts then RDP won't be available because the firewall port isn't open.  vcld opens it later on in the process.
>>
>> I have not seen the error before in the output from IP config called from configure_networking.vbs:
>> "An internal error occurred: The file name is too long."
>>
>> I'm wondering if a problem occurred obtaining the IP address.  Can you run "ipconfig /all" manually and does this error show up?  If SSH is working correctly on the private interface, then I'm guessing there is a routing table problem.  There are no 129.x entries.  This seems odd.  Do any entries appear for 129.x in the routing table it you run "ipconfig /renew", then "route print"?
>>
>> If vcld is completely loading the computer, then the problems that occur in configure_networking.vbs may not be the problem.  The output from the log file where "set_public_default_route" is called will be helpful.  The .vbs script attempts to set default routes but the vcld code does this again later on.
> 
> 

-- 
Andy Kurth
Virtual Computing Lab
Office of Information Technology
North Carolina State University
andy_kurth@ncsu.edu
919.513.4090

Only one vm working [formerly: Base image capture failure]

Posted by Terry McGuire <tm...@ualberta.ca>.
Hi Andy (and anyone else following along here).  I've been doing a lot of poking around, and, long story short,  I can now (for the first time ever) successfully book and log into the Windows image (yay!) but, annoyingly, only with a single one of the vm guest computers I've configured.

While stumbling around in the dark, I decided to try setting up a Linux base image as well as the Windows one.  The process went much quicker, but, unfortunately, it seems to be getting hung up in a similar place to the Windows image, but that's not the interesting thing.  When I created the Linux image, I created a new vm guest to run it on ("vmguest-2").  When I got tired of playing with the Linux image, I switched back to the Windows image, and, to my amazement, it worked!  And then I realized that it was loading on vmguest-2.  Still didn't work on vmguest-1.  I created yet another vm - vmguest-3 - but it also won't work on it.  Only vmguest-2.  I can't quite figure out what's special about it.  I even swapped the private ip addresses, so vmguest-1 had vmguest-2's address, same result.  (And, with the wiki down at the moment, I can't get to the Linux base image documentation to see if there was something special about how I made the vm in the first place.)

As well, the errors I get are different on vmguest-1 and 3.  On 1, it can't ssh into the machine, as before.  On 3, it starts giving me these:
____________

Failed to resolve given hostname/IP: vmguest-3.  Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
WARNING: No targets were specified, so 0 hosts scanned.
____________

To my newbie eyes, all three vm computers are all as identically configured in the vcl computers tables as possible under the circumstances.

Another thing (though probably not related):  The machines all come up with 512MB memory, but I've set them to have 1024MB.  Clearly, I'm missing some config info somewhere.  

At this point it seems I have a useful situation for continued debugging:  a working setup, but only for the Windows image, and only for a single VM.  There's *gotta* be a way to figure out what's the difference making the difference.  I'm not worrying about the Linux image right now.  I figure, once I get Windows images running properly, I'll have a much easier time getting Linux working.

On a (related) side note, I see the list is getting much busier with newbies like me asking newbie questions.  A mixed blessing?  Obvious interest in the product, but a whole lot of support work for you, huh?  Once I actually have a clue, I fully intend to start contributing back, to help with this situation.

Terry

On 7 Apr 2010, at 1418h, Andy Kurth wrote:

> Is SSH working and is everything being processed by vcld to the point where you see the Connect button on the web page?  If you are just manually running the scripts then RDP won't be available because the firewall port isn't open.  vcld opens it later on in the process.
> 
> I have not seen the error before in the output from IP config called from configure_networking.vbs:
> "An internal error occurred: The file name is too long."
> 
> I'm wondering if a problem occurred obtaining the IP address.  Can you run "ipconfig /all" manually and does this error show up?  If SSH is working correctly on the private interface, then I'm guessing there is a routing table problem.  There are no 129.x entries.  This seems odd.  Do any entries appear for 129.x in the routing table it you run "ipconfig /renew", then "route print"?
> 
> If vcld is completely loading the computer, then the problems that occur in configure_networking.vbs may not be the problem.  The output from the log file where "set_public_default_route" is called will be helpful.  The .vbs script attempts to set default routes but the vcld code does this again later on.



Re: Base image capture failure

Posted by Andy Kurth <an...@ncsu.edu>.
Is SSH working and is everything being processed by vcld to the point where you 
see the Connect button on the web page?  If you are just manually running the 
scripts then RDP won't be available because the firewall port isn't open.  vcld 
opens it later on in the process.

I have not seen the error before in the output from IP config called from 
configure_networking.vbs:
"An internal error occurred: The file name is too long."

I'm wondering if a problem occurred obtaining the IP address.  Can you run 
"ipconfig /all" manually and does this error show up?  If SSH is working 
correctly on the private interface, then I'm guessing there is a routing table 
problem.  There are no 129.x entries.  This seems odd.  Do any entries appear 
for 129.x in the routing table it you run "ipconfig /renew", then "route print"?

If vcld is completely loading the computer, then the problems that occur in 
configure_networking.vbs may not be the problem.  The output from the log file 
where "set_public_default_route" is called will be helpful.  The .vbs script 
attempts to set default routes but the vcld code does this again later on.

-Andy


> ______________
> configure_networking.vbs beginning to run: 3/23/2010 12:53:50 PM
> Windows Version: 5.1.2600
> ---------------------------------------------------------------------------
> 12:54:26 PM
> ---------------------------------------------------------------------------
> Printing routing table, command: cmd.exe /c %SystemRoot%\system32\route.exe print
> ===========================================================================
> Interface List
> 0x1 ........................... MS TCP Loopback interface
> 0x2 ...00 50 56 00 00 00 ...... AMD PCNET Family PCI Ethernet Adapter #3 - Packet Scheduler Miniport
> 0x10004 ...00 50 56 00 00 01 ...... AMD PCNET Family PCI Ethernet Adapter #4 - Packet Scheduler Miniport
> ===========================================================================
> ===========================================================================
> Active Routes:
> Network Destination        Netmask          Gateway       Interface  Metric
>           0.0.0.0          0.0.0.0      192.168.0.1     192.168.1.1	  30
>         127.0.0.0        255.0.0.0        127.0.0.1       127.0.0.1	  1
>       169.254.0.0      255.255.0.0  169.254.237.166  169.254.237.166	  30
>   169.254.237.166  255.255.255.255        127.0.0.1       127.0.0.1	  30
>   169.254.255.255  255.255.255.255  169.254.237.166  169.254.237.166	  30
>       192.168.0.0      255.255.0.0      192.168.1.1     192.168.1.1	  30
>       192.168.1.1  255.255.255.255        127.0.0.1       127.0.0.1	  30
>     192.168.1.255  255.255.255.255      192.168.1.1     192.168.1.1	  30
>         224.0.0.0        240.0.0.0  169.254.237.166  169.254.237.166	  30
>         224.0.0.0        240.0.0.0      192.168.1.1     192.168.1.1	  30
>   255.255.255.255  255.255.255.255  169.254.237.166  169.254.237.166	  1
>   255.255.255.255  255.255.255.255      192.168.1.1     192.168.1.1	  1
> Default Gateway:       192.168.0.1
> ===========================================================================
> Persistent Routes:
>   None
> Printing routing table successful, exit code: 0
> 12:54:27 PM
> ---------------------------------------------------------------------------
> 12:54:27 PM
> ---------------------------------------------------------------------------
> Running ipconfig /all, command: cmd.exe /c %SystemRoot%\system32\ipconfig.exe /all
> 
> Windows IP Configuration
> 
> An internal error occurred: The file name is too long.
> 
> Please contact Microsoft Product Support Services for further help.
> 
> Additional information: Unable to query host name.
> 
> Running ipconfig /all successful, exit code: 0
> 12:54:28 PM
> ----------------------------------------------------------------------
> *** AMD PCNET Family PCI Ethernet Adapter (Index: 1) ***
> 
> Adpater name: AMD PCNET Family PCI Ethernet Adapter
> Ignored adpater name section: 
> Ignored adpater description section: 
> IP address: 129.128.9.119
> Matching VCL private address section: 
> Matching non-public address section: 
> * PUBLIC_NAME          = Local Area Connection
> * DHCP enabled         = True
> * PUBLIC_IP            = 129.128.9.119
> * PUBLIC_SUBNET_MASK   = 255.255.254.0
> * PUBLIC_GATEWAY       = 
> * PUBLIC_DESCRIPTION   = AMD PCNET Family PCI Ethernet Adapter
> ----------------------------------------------------------------------
> *** AMD PCNET Family PCI Ethernet Adapter (Index: 4) ***
> 
> Adpater name: AMD PCNET Family PCI Ethernet Adapter
> Ignored adpater name section: 
> Ignored adpater description section: 
> IP address: 192.168.1.1
> Matching VCL private address section: 
> Matching non-public address section: 192.168
> IP address is not a public nor valid VCL private address: 192.168.1.1
> ---------------------------------------------------------------------------
> 12:54:29 PM
> ---------------------------------------------------------------------------
> PRIVATE_NAME          = 
> PRIVATE_IP            = 
> PRIVATE_SUBNET_MASK   = 
> PRIVATE_GATEWAY       = 
> 
> PUBLIC_NAME           = Local Area Connection
> PUBLIC_IP             = 129.128.9.119
> PUBLIC_SUBNET_MASK    = 255.255.254.0
> PUBLIC_GATEWAY        = 
> 
> Failed to retrieve private and public network configuration, returning exit status 1
> _________________
> 
> 
> Which looks bad to me.  Poking around a bit, it seems that configure_networking.vbs expects my private lan to be 10.x.x.x, but I've got it as 192.168.x.x, as per other documentation.  Is this relevant?
> 
> To summarize this round of glitch-squishing, the sysprep_cmdlines.cmd issue, with the symptom of the failure to autologin after running sysprep, seems to be solved, or at least worked around, by the pre-creation of the Logs folder.  But the inability to connect via RDC after the reservation is made persists, which may be due to something going wrong with the configure_networking.vbs script.
> 
> Back to you, Andy (with continued gratefulness for your help).
> 
> Terry
> 

Re: Base image capture failure

Posted by Terry McGuire <tm...@ualberta.ca>.
On 18 Mar 2010, at 0742h, Andy Kurth wrote:

> Hi Terry,
> Sorry for the delay.  This information is helpful.  You're right, the root cause seems to be that sysprep_cmdlines.cmd isn't running.
> 
> I have seen the issue where you can't enter a password before.  This only seems to happen for the newer style logon screen, not the classic logon screen.  I'm not sure of the cause but you can get to the classic logon screen by pressing Ctrl-Alt-Del twice.  Under the VMware console, press Ctrl-Alt-Insert twice.

Weirdly, sometimes I need the classic logon, sometimes I don't.  Whatever.  I use it when I need to.


> You can begin troubleshooting by examining C:\Windows\setuplog.txt file.  There should be a few lines that look like the section I have copied to the end of this message.  Search setuplog.txt for "sysprep_cmdlines.cmd".  Does anything show up?

Yes, just like in your example, except it returns an exit code of 1.  That's bad, right?  Except, skipping ahead a bit, all is well when I apply the sysprep_cmdlines.cmd fix I figured out, exit code 0.  Keep reading.


> Next, examine the Sysprep files.  A copy of the same exact Sysprep files used when the image loaded should still be on the computer in C:\cygwin\home\root\VCL\Utilities\Sysprep.  This directory is copied to C:\Sysprep before an image is captured.  Sysprep automatically deletes C:\Sysprep when it finishes, so the VCL code copies everything to C:\cygwin\... and then makes an additional copy in C:\Sysprep so that the files are retained for troubleshooting.

Ah, that explains the InstallFilesPath=C:\sysprep\i386" bit.


> The "InstallFilesPath=C:\sysprep\i386" line is correct.  Within the Sysprep directory, there should also be the following file:
> C:\cygwin\home\root\VCL\Utilities\Sysprep\i386\$oem$\cmdlines.txt
> 
> This InstallFilesPath line in sysprep.inf causes cmdlines.txt to be processed during minisetup.  Sysprep automatically calls the commands in cmdlines.txt before the computer boots Windows for the first time.  You should see a call to sysprep_cmdlines.cmd in cmdlines.txt.
> 
> So, make sure of the following:
> -cmdlines.txt resides in the location noted above
> -cmdlines.txt includes a line calling sysprep_cmdlines.cmd

This all looks good.


> You can troubleshoot this by manually running Sysprep.  But first, load your image by making an imaging reservation (Manage Images -> Create/Update image) rather than a normal reservation.  The reason for this is because VCL configures the VM to run in persistent mode for imaging reservations and nonpersistent mode for normal reservations.  If the VM is running in nonpersistent mode and you reboot the machine, it will likely restart in the initial hard drive state saved in the .vmdk files rather than the state the VM before it was rebooted.  If running in persistent mode, the VM's hard drive state is saved when it is rebooted.
> 
> Manually run Sysprep:
> -Log in as root
> -Copy the entire Sysprep directory under C:\cygwin to C:\
> -Copy the entire C:\cygwin\home\root\VCL\Drivers directory to C:\Sysprep
> -Delete C:\cygwin\home\root\VCL\Logs to replicate the original state
> -Run the command: "C:\Sysprep\sysprep.exe /quiet /reseal /mini /reboot"
> 
> You should see the computer reboot into the minisetup phase.  Towards the end of this phase, you should see some black command boxes appear then close.  This is when sysprep_cmdlines.cmd is being run.  It should then reboot again and automatically log on as root.

Same results - autologin doesn't happen.


> If you don't see the black boxes during minisetup and it doesn't autologon, try manually running the command contained within cmdlines.txt after Sysprep is done:
> -Log in as root
> -Delete C:\cygwin\home\root\VCL\Logs
> -Open cmd.exe
> -Run this command (1 line):
> cmd.exe /c C:\Cygwin\home\root\VCL\Scripts\sysprep_cmdlines.cmd > C:\Cygwin\home\root\VCL\Logs\sysprep_cmdlines.log 2>&1

Logging in manually after running sysprep manually, I still don't have the log folder.  Running sysprep_cmdlines manually, entered exactly as you have it here, I get "The system cannot find the path specified."  Hmm.  If I manually create a Logs dir, then rerun the command, stuff happens.  Looking at the resulting sysprep_cmdlines.log file, it exits with a status of "0".  Ok, I then restart.  More promising looking things happen.  Autologin, many black boxes, then autologout.

I did examine the permissions for root, and all looked good.  I even applied the chmod as you suggested, but the problem persisted.  I've now tweaked the image to have the Logs folder pre-created, and things seem to work fine.  Autologin, many black boxes, and, when I make a reservation, it makes it all the way to "Connect!" without manual intervention.  However, unfortunately, I still can't login with RDC.

Manually logging back in again via the console, and looking at the post_load log, I see it's exited with a 1.  Looking more closely, configure_networking.vbs is exiting with an "errorlevel: 1".  Looking at its log, I see this:

______________
configure_networking.vbs beginning to run: 3/23/2010 12:53:50 PM
Windows Version: 5.1.2600
---------------------------------------------------------------------------
12:54:26 PM
---------------------------------------------------------------------------
Printing routing table, command: cmd.exe /c %SystemRoot%\system32\route.exe print
===========================================================================
Interface List
0x1 ........................... MS TCP Loopback interface
0x2 ...00 50 56 00 00 00 ...... AMD PCNET Family PCI Ethernet Adapter #3 - Packet Scheduler Miniport
0x10004 ...00 50 56 00 00 01 ...... AMD PCNET Family PCI Ethernet Adapter #4 - Packet Scheduler Miniport
===========================================================================
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0      192.168.0.1     192.168.1.1	  30
        127.0.0.0        255.0.0.0        127.0.0.1       127.0.0.1	  1
      169.254.0.0      255.255.0.0  169.254.237.166  169.254.237.166	  30
  169.254.237.166  255.255.255.255        127.0.0.1       127.0.0.1	  30
  169.254.255.255  255.255.255.255  169.254.237.166  169.254.237.166	  30
      192.168.0.0      255.255.0.0      192.168.1.1     192.168.1.1	  30
      192.168.1.1  255.255.255.255        127.0.0.1       127.0.0.1	  30
    192.168.1.255  255.255.255.255      192.168.1.1     192.168.1.1	  30
        224.0.0.0        240.0.0.0  169.254.237.166  169.254.237.166	  30
        224.0.0.0        240.0.0.0      192.168.1.1     192.168.1.1	  30
  255.255.255.255  255.255.255.255  169.254.237.166  169.254.237.166	  1
  255.255.255.255  255.255.255.255      192.168.1.1     192.168.1.1	  1
Default Gateway:       192.168.0.1
===========================================================================
Persistent Routes:
  None
Printing routing table successful, exit code: 0
12:54:27 PM
---------------------------------------------------------------------------
12:54:27 PM
---------------------------------------------------------------------------
Running ipconfig /all, command: cmd.exe /c %SystemRoot%\system32\ipconfig.exe /all

Windows IP Configuration

An internal error occurred: The file name is too long.

Please contact Microsoft Product Support Services for further help.

Additional information: Unable to query host name.

Running ipconfig /all successful, exit code: 0
12:54:28 PM
----------------------------------------------------------------------
*** AMD PCNET Family PCI Ethernet Adapter (Index: 1) ***

Adpater name: AMD PCNET Family PCI Ethernet Adapter
Ignored adpater name section: 
Ignored adpater description section: 
IP address: 129.128.9.119
Matching VCL private address section: 
Matching non-public address section: 
* PUBLIC_NAME          = Local Area Connection
* DHCP enabled         = True
* PUBLIC_IP            = 129.128.9.119
* PUBLIC_SUBNET_MASK   = 255.255.254.0
* PUBLIC_GATEWAY       = 
* PUBLIC_DESCRIPTION   = AMD PCNET Family PCI Ethernet Adapter
----------------------------------------------------------------------
*** AMD PCNET Family PCI Ethernet Adapter (Index: 4) ***

Adpater name: AMD PCNET Family PCI Ethernet Adapter
Ignored adpater name section: 
Ignored adpater description section: 
IP address: 192.168.1.1
Matching VCL private address section: 
Matching non-public address section: 192.168
IP address is not a public nor valid VCL private address: 192.168.1.1
---------------------------------------------------------------------------
12:54:29 PM
---------------------------------------------------------------------------
PRIVATE_NAME          = 
PRIVATE_IP            = 
PRIVATE_SUBNET_MASK   = 
PRIVATE_GATEWAY       = 

PUBLIC_NAME           = Local Area Connection
PUBLIC_IP             = 129.128.9.119
PUBLIC_SUBNET_MASK    = 255.255.254.0
PUBLIC_GATEWAY        = 

Failed to retrieve private and public network configuration, returning exit status 1
_________________


Which looks bad to me.  Poking around a bit, it seems that configure_networking.vbs expects my private lan to be 10.x.x.x, but I've got it as 192.168.x.x, as per other documentation.  Is this relevant?

To summarize this round of glitch-squishing, the sysprep_cmdlines.cmd issue, with the symptom of the failure to autologin after running sysprep, seems to be solved, or at least worked around, by the pre-creation of the Logs folder.  But the inability to connect via RDC after the reservation is made persists, which may be due to something going wrong with the configure_networking.vbs script.

Back to you, Andy (with continued gratefulness for your help).

Terry


Re: Base image capture failure

Posted by Andy Kurth <an...@ncsu.edu>.
Hi Terry,
Sorry for the delay.  This information is helpful.  You're right, the root cause 
seems to be that sysprep_cmdlines.cmd isn't running.

I have seen the issue where you can't enter a password before.  This only seems 
to happen for the newer style logon screen, not the classic logon screen.  I'm 
not sure of the cause but you can get to the classic logon screen by pressing 
Ctrl-Alt-Del twice.  Under the VMware console, press Ctrl-Alt-Insert twice.

You can begin troubleshooting by examining C:\Windows\setuplog.txt file.  There 
should be a few lines that look like the section I have copied to the end of 
this message.  Search setuplog.txt for "sysprep_cmdlines.cmd".  Does anything 
show up?

Next, examine the Sysprep files.  A copy of the same exact Sysprep files used 
when the image loaded should still be on the computer in 
C:\cygwin\home\root\VCL\Utilities\Sysprep.  This directory is copied to 
C:\Sysprep before an image is captured.  Sysprep automatically deletes 
C:\Sysprep when it finishes, so the VCL code copies everything to C:\cygwin\... 
and then makes an additional copy in C:\Sysprep so that the files are retained 
for troubleshooting.

The "InstallFilesPath=C:\sysprep\i386" line is correct.  Within the Sysprep 
directory, there should also be the following file:
C:\cygwin\home\root\VCL\Utilities\Sysprep\i386\$oem$\cmdlines.txt

This InstallFilesPath line in sysprep.inf causes cmdlines.txt to be processed 
during minisetup.  Sysprep automatically calls the commands in cmdlines.txt 
before the computer boots Windows for the first time.  You should see a call to 
sysprep_cmdlines.cmd in cmdlines.txt.

So, make sure of the following:
-cmdlines.txt resides in the location noted above
-cmdlines.txt includes a line calling sysprep_cmdlines.cmd

You can troubleshoot this by manually running Sysprep.  But first, load your 
image by making an imaging reservation (Manage Images -> Create/Update image) 
rather than a normal reservation.  The reason for this is because VCL configures 
the VM to run in persistent mode for imaging reservations and nonpersistent mode 
for normal reservations.  If the VM is running in nonpersistent mode and you 
reboot the machine, it will likely restart in the initial hard drive state saved 
in the .vmdk files rather than the state the VM before it was rebooted.  If 
running in persistent mode, the VM's hard drive state is saved when it is rebooted.

Manually run Sysprep:
-Log in as root
-Copy the entire Sysprep directory under C:\cygwin to C:\
-Copy the entire C:\cygwin\home\root\VCL\Drivers directory to C:\Sysprep
-Delete C:\cygwin\home\root\VCL\Logs to replicate the original state
-Run the command: "C:\Sysprep\sysprep.exe /quiet /reseal /mini /reboot"

You should see the computer reboot into the minisetup phase.  Towards the end of 
this phase, you should see some black command boxes appear then close.  This is 
when sysprep_cmdlines.cmd is being run.  It should then reboot again and 
automatically log on as root.

If you don't see the black boxes during minisetup and it doesn't autologon, try 
manually running the command contained within cmdlines.txt after Sysprep is done:
-Log in as root
-Delete C:\cygwin\home\root\VCL\Logs
-Open cmd.exe
-Run this command (1 line):
cmd.exe /c C:\Cygwin\home\root\VCL\Scripts\sysprep_cmdlines.cmd > 
C:\Cygwin\home\root\VCL\Logs\sysprep_cmdlines.log 2>&1

If things still aren't working, I'm wondering if it could be a permissions 
problem.  Make sure root is the owner of its home directory:
-Log in as root
-Open a Cygwin shell
-Run: "chown -R root:Administrators ~/"
-Try running Sysprep again

Hope this helps,
Andy


****************
setuplog.txt section showing where sysprep_cmdlines.cmd was run:

03/12/2010 
15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,SetUpVirtualMemory: loc 1
03/12/2010 15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,Setup 
configured the system to place a 384 MB pagefile on drive C:.
03/12/2010 15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,Crashdump was 
enabled.
03/12/2010 
15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,SetUpVirtualMemory: EXIT (1)
03/12/2010 15:08:14.155,d:\xpsp\base\ntsetup\syssetup\syssetup.c,2725,,Power 
scheme: desktop.
03/12/2010 
15:08:14.155,d:\xpsp\base\ntsetup\syssetup\syssetup.c,2729,,SetActivePwrScheme 
succeeded.
03/12/2010 15:08:21.343,d:\xpsp\base\ntsetup\syssetup\log.c,133,,The external 
program cmd.exe /c C:\Cygwin\home\root\VCL\Scripts\sysprep_cmdlines.cmd > 
C:\Cygwin\home\root\VCL\Logs\sysprep_cmdlines.log 2>&1 returned exit code 0.
03/12/2010 
15:08:21.343,d:\xpsp\base\ntsetup\syssetup\syssetup.c,4034,BEGIN_SECTION,Fixing 
up hives
03/12/2010 
15:08:21.593,d:\xpsp\base\ntsetup\syssetup\syssetup.c,4041,END_SECTION,Fixing up 
hives

Re: Base image capture failure

Posted by Terry McGuire <tm...@ualberta.ca>.
Hi Andy.  First, let me thank you once again for sticking with this ongoing saga.  Next time I'm in North Carolina, I'll buy you a beer!

On 2 Mar 2010, at 0831h, Andy Kurth wrote:

> You will need to watch the VM console after the VM is turned on in order to troubleshoot this.  You should see the following:
> 
> -VM is turned on
> -Sysprep minisetup runs, VM is rebooted
> -When Windows boots up for the first time, the root account is automatically logged on
> -A few black command boxes appear on the desktop, the one in the back is named post_load.cmd
> -When the command boxes close, root is logged off
> -At this point, the computer should respond to SSH

I see the VM turn on, sysprep runs, VM reboots, but then after Windows loads it just stays at the login window.

> You should be able to log on as root via the VMware console.  The password should be the one configured as WINDOWS_ROOT_PASSWORD /etc/vcl/vcld.conf.  After logging in, view the log files generated by the VCL scripts.  All of the output generated by the scripts gets saved into files in C:\cygwin\home\root\VCL\Logs.

I can indeed login as root via the console, with the password I put in vcld.conf.  However, there is no log folder in C:\cygwin\home\root\vcl - just Drivers, Scripts, Security and Utilities.  Which, after poking around a bit, means that post_load has not run, yes?

If I run post_load manually, everything seems to move along nicely, but after it logs out, I can no longer login as root.  It doesn't let me type a password, even though it's asking for one.

Interestingly, if I make a reservation at this point, the reservation appears to be set up properly, and is acknowledged through the web interface, but when I try to login via RDC, it fails, acting like there's no machine to talk to.  (And, yes, I'm trying to connect from the same machine I clicked the "Connect!" button on.)  The vcld log suggests all is well (there's lots of log, as you'd know - let me know if I should send you any of it.)

Trying to ssh or ping the vm on its public address fails, but that might be normal, yes?

> The troubleshooting steps depend largely on whether or not you see root being automatically logged on.
> 
> If root is not logged on automatically, the problem can probably be found in sysprep_cmdlines.log and the files in Logs\sysprep_cmdlines directory.  These files are generated during the Sysprep minisetup stage when Scripts\sysprep_cmdlines.cmd runs.  This script configures root's autologon and sets a registry key to cause Scripts\post_load.cmd to run after root is automatically logged on.


> If it's attempting to log on root but failing because of a credentials problem, the cause could be that the password was not correctly configured in Scripts\autologon_enable.cmd.  Check the "set PASSWORD=" line in this file.

The autologin_enable script has the correct password, and when I run it then restart, autologin works.

> If root is being logged on, first check if the Cygwin SSHD service is running and if the firewall has an exception for TCP port 22.  Be sure to check both the middle "Exceptions" tab and the settings for each adapter under the "Advanced" tab for the exception.  My guess is that SSHD failed to start.  The problem can probably be found in Logs\post_load.log and in the files in the Logs\post_load directory.  Check Logs\update_cygwin.cmd for errors.
> 
> As you'll see in the log files, there's a lot that has to happen in order for everything to work correctly.  The output from the log files will be helpful in order to figure this out.

Poking around in the scripts folder, I see that this whole post-load series of events is contingent on Sysprep running sysprep_cmdlines, which perhaps it's not doing.  Does it matter that the sysprep.inf file includes "InstallFilesPath=C:\sysprep\i386"?  This doesn't seem right to me, but to change it I'd need to alter the base image, which frightens me.  So, I'll await your reply before trying anything that crazy.

Terry