You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@vcl.apache.org by Andy Kurth <an...@ncsu.edu> on 2010/03/02 16:31:01 UTC
Re: Base image capture failure
You will need to watch the VM console after the VM is turned on in order to
troubleshoot this. You should see the following:
-VM is turned on
-Sysprep minisetup runs, VM is rebooted
-When Windows boots up for the first time, the root account is automatically
logged on
-A few black command boxes appear on the desktop, the one in the back is named
post_load.cmd
-When the command boxes close, root is logged off
-At this point, the computer should respond to SSH
You should be able to log on as root via the VMware console. The password
should be the one configured as WINDOWS_ROOT_PASSWORD /etc/vcl/vcld.conf. After
logging in, view the log files generated by the VCL scripts. All of the output
generated by the scripts gets saved into files in C:\cygwin\home\root\VCL\Logs.
The troubleshooting steps depend largely on whether or not you see root being
automatically logged on.
If root is not logged on automatically, the problem can probably be found in
sysprep_cmdlines.log and the files in Logs\sysprep_cmdlines directory. These
files are generated during the Sysprep minisetup stage when
Scripts\sysprep_cmdlines.cmd runs. This script configures root's autologon and
sets a registry key to cause Scripts\post_load.cmd to run after root is
automatically logged on.
If it's attempting to log on root but failing because of a credentials problem,
the cause could be that the password was not correctly configured in
Scripts\autologon_enable.cmd. Check the "set PASSWORD=" line in this file.
If root is being logged on, first check if the Cygwin SSHD service is running
and if the firewall has an exception for TCP port 22. Be sure to check both the
middle "Exceptions" tab and the settings for each adapter under the "Advanced"
tab for the exception. My guess is that SSHD failed to start. The problem can
probably be found in Logs\post_load.log and in the files in the Logs\post_load
directory. Check Logs\update_cygwin.cmd for errors.
As you'll see in the log files, there's a lot that has to happen in order for
everything to work correctly. The output from the log files will be helpful in
order to figure this out.
Regards,
Andy
On 2/17/2010 7:22 PM, Terry McGuire wrote:
> Well, I think the base image is officially captured, but I don't seem to be able to quite make it work. I've repeated the capture a few times and always end up in a situation where, when I make a reservation for the image, the image loads on the VM and various other useful-looking things happen, but ends before the reservation is made available to me with this error:
>
> ______________________
> 2010-02-17 17:01:23|16589|3:8|new|vmware.pm:load(848)|vmguest-1 ROUND 1 checks loop 19 of 40
> 2010-02-17 17:01:23|16589|3:8|new|utils.pm:run_ssh_command(6180)|executing SSH command on localvmhost:
> |16589|3:8|new| /usr/bin/ssh -i /etc/vcl/vcl.key -l root -p 22 -x localvmhost 'vmware-cmd /var/lib/vmware/Virtual\ Machines/vmwarewinxp-base7-v0vmguest-1/vmwarewinxp-base7-v0vmguest-1.vmx getstate' 2>&1
> 2010-02-17 17:01:24|16589|3:8|new|utils.pm:run_ssh_command(6262)|run_ssh_command output:
> |16589|3:8|new| getstate() = on
> 2010-02-17 17:01:24|16589|3:8|new|utils.pm:run_ssh_command(6276)|SSH command executed on localvmhost, returning (0, "getstate() = on")
> 2010-02-17 17:01:24|16589|3:8|new|vmware.pm:load(852)|rechecking state of vm vmguest-1 /var/lib/vmware/Virtual\ Machines/vmwarewinxp-base7-v0vmguest-1/vmwarewinxp-base7-v0vmguest-1.vmx
> 2010-02-17 17:01:24|16589|3:8|new|vmware.pm:load(857)|vm vmguest-1 reports on
> 2010-02-17 17:01:24|16589|3:8|new|vmware.pm:load(868)|sshd is NOT active on vmguest-1 yet
> ____________________
>
> It tries for a long time to ssh into the machine, but doesn't succeed. I see in the vmware server console that the vm is up and running, but it can't be sshed into. When I try it from the management node's command line, I get "connection refused". Obviously, it *was* working, so I guess something went screwy in the capture process, yes? But, well, I haven't been able to figure out what. Thus, yet another message out to you.
>
> Ideas?
>
> Terry
>
> On 10 Feb 2010, at 0940h, Andy Kurth wrote:
>
>> It looks like the image capture was successful and the vmware.pm module had trouble changing the file names to the new image name. I don't think it was the result of renaming the VM directory. You had the right idea by changing it to match the reservation ID. I think the problem has to do with the original names of the .vmdk files which were named after the manually created VM. What are the contents of /install/vmware_images/vmwarewinxp-base7-v0/?
>>
>> At this point I would manually fix the captured VM files. The .vmdk files should be named vmwarewinxp-base7-v0-s00x.vmdk. Rename all of the .vmdk files in the /install/vmware_images/vmwarewinxp-base7-v0/ directory to match this format. Change the first part of the names but keep the 's00x.vmdk' as they are named now.
>>
>> There should be one .vmdk file without the 's00x' part. This should now be named vmwarewinxp-base7-v0.vmdk. This file needs to be edited because it contains the names of the other .vmdk files. You should see an "Extent description" section in the file with the original names. Change each lines to include 'vmwarewinxp-base7-v0-x00x.vmdk' instead of the old name.
>>
>> Next, make sure the VCL 'deleted' column in the image and imagerevision tables for this image is set to 0. In the image table, check id=7. You'll have to look at the imagerevision table to figure out which one is for this revision. The imagerevision.imagename value will be vmwarewinxp-base7-v0.
>>
>> Next, make sure there isn't a directory named '/var/lib/vmware/Virtual Machines/vmwarewinxp-base7-v0'. There shouldn't be one but check to make sure. If it exists, rename it for now.
>>
>> Next, cross your fingers and try to make a reservation for this image. If you created and configured multiple VMs in VCL then another one should already be in the available state and you should be able to make a reservation. If not, change the state of your VM to 'available' via Manage Computers.
>>
>> If you have trouble, the following will be useful:
>> $ ls -l /install/vmware_images
>> $ ls -l /install/vmware_images/vmwarewinxp-base7-v0
>> $ ls -l /var/lib/vmware/Virtual\ Machines/
>> $ cat /install/vmware_images/vmwarewinxp-base7-v0/vmwarewinxp-base7-v0.vmdk
>>
>> I'm thinking there's a problem with the instructions that caused this latest problem. I'll go through them. Stating the obvious, but we obviously need a much better way to create base image reservations.
>
Re: Only one vm working
Posted by Terry McGuire <tm...@ualberta.ca>.
Hi Andy. A breakthrough! I now have multiple vm guests working! Yay! Weirdly, I still don't have the *first* vm guest - vmguest-1 - working, but at this point I don't really care, as this is all just for evaluation purposes anyway. The details, for the record:
The problem with vmguest-3, and what caused the new error in the vcld log:
____________
Failed to resolve given hostname/IP: vmguest-3. Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
WARNING: No targets were specified, so 0 hosts scanned.
___________
...was that I had indeed forgotten to add vmguest-3 to /etc/hosts. Adding it made the above error go away and, miraculously, also made ssh successful.
It remains a mystery why vmguest-1 doesn't work, but since I now have multiple functioning vmguests I'm moving on to other challenges, namely, creating more images, both Windows and Linux, and adding users to the system. A thousand thanks for your perseverance here Andy. Hopefully, I'll be able to deal with the remaining challenges with a little less of your time and patience.
Regards,
Terry
Re: Only one vm working
Posted by Andy Kurth <an...@ncsu.edu>.
Hi Terry,
Is the image booting on vmguest-2 and 3 but SSH is failing, or is it not booting
at all? If it isn't booting, check the computer.drivetype values for the VMs.
I came across an issue with this last week with another pilot attempting to
create a Linux VMware base image. If the value differs among the VMs, try
swapping sda/hda and see what happens.
If the image is booting but SSH isn't responding, check the MAC addresses and IP
addresses that are assigned to the VMs. If it isn't receiving an IP address, do
the private MAC addresses match dhcpd.conf and /etc/hosts? Also, check the VM
host to make sure you don't have multiple instances of a VM using the same MAC
address.
I'm not sure what's causing the "Failed to resolve given hostname" error. I'm
guessing this is coming from the nmap command. Was this error listed in
vcld.log or did you see it somewhere else? Please provide some lines leading up
to this error if it's from the log.
Regards,
Andy
Terry McGuire wrote:
> Hi Andy (and anyone else following along here). I've been doing a lot of poking around, and, long story short, I can now (for the first time ever) successfully book and log into the Windows image (yay!) but, annoyingly, only with a single one of the vm guest computers I've configured.
>
> While stumbling around in the dark, I decided to try setting up a Linux base image as well as the Windows one. The process went much quicker, but, unfortunately, it seems to be getting hung up in a similar place to the Windows image, but that's not the interesting thing. When I created the Linux image, I created a new vm guest to run it on ("vmguest-2"). When I got tired of playing with the Linux image, I switched back to the Windows image, and, to my amazement, it worked! And then I realized that it was loading on vmguest-2. Still didn't work on vmguest-1. I created yet another vm - vmguest-3 - but it also won't work on it. Only vmguest-2. I can't quite figure out what's special about it. I even swapped the private ip addresses, so vmguest-1 had vmguest-2's address, same result. (And, with the wiki down at the moment, I can't get to the Linux base image documentation to see if there was something special about how I made the vm in the first place.)
>
> As well, the errors I get are different on vmguest-1 and 3. On 1, it can't ssh into the machine, as before. On 3, it starts giving me these:
> ____________
>
> Failed to resolve given hostname/IP: vmguest-3. Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
> WARNING: No targets were specified, so 0 hosts scanned.
> ____________
>
> To my newbie eyes, all three vm computers are all as identically configured in the vcl computers tables as possible under the circumstances.
>
> Another thing (though probably not related): The machines all come up with 512MB memory, but I've set them to have 1024MB. Clearly, I'm missing some config info somewhere.
>
> At this point it seems I have a useful situation for continued debugging: a working setup, but only for the Windows image, and only for a single VM. There's *gotta* be a way to figure out what's the difference making the difference. I'm not worrying about the Linux image right now. I figure, once I get Windows images running properly, I'll have a much easier time getting Linux working.
>
> On a (related) side note, I see the list is getting much busier with newbies like me asking newbie questions. A mixed blessing? Obvious interest in the product, but a whole lot of support work for you, huh? Once I actually have a clue, I fully intend to start contributing back, to help with this situation.
>
> Terry
>
> On 7 Apr 2010, at 1418h, Andy Kurth wrote:
>
>> Is SSH working and is everything being processed by vcld to the point where you see the Connect button on the web page? If you are just manually running the scripts then RDP won't be available because the firewall port isn't open. vcld opens it later on in the process.
>>
>> I have not seen the error before in the output from IP config called from configure_networking.vbs:
>> "An internal error occurred: The file name is too long."
>>
>> I'm wondering if a problem occurred obtaining the IP address. Can you run "ipconfig /all" manually and does this error show up? If SSH is working correctly on the private interface, then I'm guessing there is a routing table problem. There are no 129.x entries. This seems odd. Do any entries appear for 129.x in the routing table it you run "ipconfig /renew", then "route print"?
>>
>> If vcld is completely loading the computer, then the problems that occur in configure_networking.vbs may not be the problem. The output from the log file where "set_public_default_route" is called will be helpful. The .vbs script attempts to set default routes but the vcld code does this again later on.
>
>
--
Andy Kurth
Virtual Computing Lab
Office of Information Technology
North Carolina State University
andy_kurth@ncsu.edu
919.513.4090
Only one vm working [formerly: Base image capture failure]
Posted by Terry McGuire <tm...@ualberta.ca>.
Hi Andy (and anyone else following along here). I've been doing a lot of poking around, and, long story short, I can now (for the first time ever) successfully book and log into the Windows image (yay!) but, annoyingly, only with a single one of the vm guest computers I've configured.
While stumbling around in the dark, I decided to try setting up a Linux base image as well as the Windows one. The process went much quicker, but, unfortunately, it seems to be getting hung up in a similar place to the Windows image, but that's not the interesting thing. When I created the Linux image, I created a new vm guest to run it on ("vmguest-2"). When I got tired of playing with the Linux image, I switched back to the Windows image, and, to my amazement, it worked! And then I realized that it was loading on vmguest-2. Still didn't work on vmguest-1. I created yet another vm - vmguest-3 - but it also won't work on it. Only vmguest-2. I can't quite figure out what's special about it. I even swapped the private ip addresses, so vmguest-1 had vmguest-2's address, same result. (And, with the wiki down at the moment, I can't get to the Linux base image documentation to see if there was something special about how I made the vm in the first place.)
As well, the errors I get are different on vmguest-1 and 3. On 1, it can't ssh into the machine, as before. On 3, it starts giving me these:
____________
Failed to resolve given hostname/IP: vmguest-3. Note that you can't use '/mask' AND '1-4,7,100-' style IP ranges
WARNING: No targets were specified, so 0 hosts scanned.
____________
To my newbie eyes, all three vm computers are all as identically configured in the vcl computers tables as possible under the circumstances.
Another thing (though probably not related): The machines all come up with 512MB memory, but I've set them to have 1024MB. Clearly, I'm missing some config info somewhere.
At this point it seems I have a useful situation for continued debugging: a working setup, but only for the Windows image, and only for a single VM. There's *gotta* be a way to figure out what's the difference making the difference. I'm not worrying about the Linux image right now. I figure, once I get Windows images running properly, I'll have a much easier time getting Linux working.
On a (related) side note, I see the list is getting much busier with newbies like me asking newbie questions. A mixed blessing? Obvious interest in the product, but a whole lot of support work for you, huh? Once I actually have a clue, I fully intend to start contributing back, to help with this situation.
Terry
On 7 Apr 2010, at 1418h, Andy Kurth wrote:
> Is SSH working and is everything being processed by vcld to the point where you see the Connect button on the web page? If you are just manually running the scripts then RDP won't be available because the firewall port isn't open. vcld opens it later on in the process.
>
> I have not seen the error before in the output from IP config called from configure_networking.vbs:
> "An internal error occurred: The file name is too long."
>
> I'm wondering if a problem occurred obtaining the IP address. Can you run "ipconfig /all" manually and does this error show up? If SSH is working correctly on the private interface, then I'm guessing there is a routing table problem. There are no 129.x entries. This seems odd. Do any entries appear for 129.x in the routing table it you run "ipconfig /renew", then "route print"?
>
> If vcld is completely loading the computer, then the problems that occur in configure_networking.vbs may not be the problem. The output from the log file where "set_public_default_route" is called will be helpful. The .vbs script attempts to set default routes but the vcld code does this again later on.
Re: Base image capture failure
Posted by Andy Kurth <an...@ncsu.edu>.
Is SSH working and is everything being processed by vcld to the point where you
see the Connect button on the web page? If you are just manually running the
scripts then RDP won't be available because the firewall port isn't open. vcld
opens it later on in the process.
I have not seen the error before in the output from IP config called from
configure_networking.vbs:
"An internal error occurred: The file name is too long."
I'm wondering if a problem occurred obtaining the IP address. Can you run
"ipconfig /all" manually and does this error show up? If SSH is working
correctly on the private interface, then I'm guessing there is a routing table
problem. There are no 129.x entries. This seems odd. Do any entries appear
for 129.x in the routing table it you run "ipconfig /renew", then "route print"?
If vcld is completely loading the computer, then the problems that occur in
configure_networking.vbs may not be the problem. The output from the log file
where "set_public_default_route" is called will be helpful. The .vbs script
attempts to set default routes but the vcld code does this again later on.
-Andy
> ______________
> configure_networking.vbs beginning to run: 3/23/2010 12:53:50 PM
> Windows Version: 5.1.2600
> ---------------------------------------------------------------------------
> 12:54:26 PM
> ---------------------------------------------------------------------------
> Printing routing table, command: cmd.exe /c %SystemRoot%\system32\route.exe print
> ===========================================================================
> Interface List
> 0x1 ........................... MS TCP Loopback interface
> 0x2 ...00 50 56 00 00 00 ...... AMD PCNET Family PCI Ethernet Adapter #3 - Packet Scheduler Miniport
> 0x10004 ...00 50 56 00 00 01 ...... AMD PCNET Family PCI Ethernet Adapter #4 - Packet Scheduler Miniport
> ===========================================================================
> ===========================================================================
> Active Routes:
> Network Destination Netmask Gateway Interface Metric
> 0.0.0.0 0.0.0.0 192.168.0.1 192.168.1.1 30
> 127.0.0.0 255.0.0.0 127.0.0.1 127.0.0.1 1
> 169.254.0.0 255.255.0.0 169.254.237.166 169.254.237.166 30
> 169.254.237.166 255.255.255.255 127.0.0.1 127.0.0.1 30
> 169.254.255.255 255.255.255.255 169.254.237.166 169.254.237.166 30
> 192.168.0.0 255.255.0.0 192.168.1.1 192.168.1.1 30
> 192.168.1.1 255.255.255.255 127.0.0.1 127.0.0.1 30
> 192.168.1.255 255.255.255.255 192.168.1.1 192.168.1.1 30
> 224.0.0.0 240.0.0.0 169.254.237.166 169.254.237.166 30
> 224.0.0.0 240.0.0.0 192.168.1.1 192.168.1.1 30
> 255.255.255.255 255.255.255.255 169.254.237.166 169.254.237.166 1
> 255.255.255.255 255.255.255.255 192.168.1.1 192.168.1.1 1
> Default Gateway: 192.168.0.1
> ===========================================================================
> Persistent Routes:
> None
> Printing routing table successful, exit code: 0
> 12:54:27 PM
> ---------------------------------------------------------------------------
> 12:54:27 PM
> ---------------------------------------------------------------------------
> Running ipconfig /all, command: cmd.exe /c %SystemRoot%\system32\ipconfig.exe /all
>
> Windows IP Configuration
>
> An internal error occurred: The file name is too long.
>
> Please contact Microsoft Product Support Services for further help.
>
> Additional information: Unable to query host name.
>
> Running ipconfig /all successful, exit code: 0
> 12:54:28 PM
> ----------------------------------------------------------------------
> *** AMD PCNET Family PCI Ethernet Adapter (Index: 1) ***
>
> Adpater name: AMD PCNET Family PCI Ethernet Adapter
> Ignored adpater name section:
> Ignored adpater description section:
> IP address: 129.128.9.119
> Matching VCL private address section:
> Matching non-public address section:
> * PUBLIC_NAME = Local Area Connection
> * DHCP enabled = True
> * PUBLIC_IP = 129.128.9.119
> * PUBLIC_SUBNET_MASK = 255.255.254.0
> * PUBLIC_GATEWAY =
> * PUBLIC_DESCRIPTION = AMD PCNET Family PCI Ethernet Adapter
> ----------------------------------------------------------------------
> *** AMD PCNET Family PCI Ethernet Adapter (Index: 4) ***
>
> Adpater name: AMD PCNET Family PCI Ethernet Adapter
> Ignored adpater name section:
> Ignored adpater description section:
> IP address: 192.168.1.1
> Matching VCL private address section:
> Matching non-public address section: 192.168
> IP address is not a public nor valid VCL private address: 192.168.1.1
> ---------------------------------------------------------------------------
> 12:54:29 PM
> ---------------------------------------------------------------------------
> PRIVATE_NAME =
> PRIVATE_IP =
> PRIVATE_SUBNET_MASK =
> PRIVATE_GATEWAY =
>
> PUBLIC_NAME = Local Area Connection
> PUBLIC_IP = 129.128.9.119
> PUBLIC_SUBNET_MASK = 255.255.254.0
> PUBLIC_GATEWAY =
>
> Failed to retrieve private and public network configuration, returning exit status 1
> _________________
>
>
> Which looks bad to me. Poking around a bit, it seems that configure_networking.vbs expects my private lan to be 10.x.x.x, but I've got it as 192.168.x.x, as per other documentation. Is this relevant?
>
> To summarize this round of glitch-squishing, the sysprep_cmdlines.cmd issue, with the symptom of the failure to autologin after running sysprep, seems to be solved, or at least worked around, by the pre-creation of the Logs folder. But the inability to connect via RDC after the reservation is made persists, which may be due to something going wrong with the configure_networking.vbs script.
>
> Back to you, Andy (with continued gratefulness for your help).
>
> Terry
>
Re: Base image capture failure
Posted by Terry McGuire <tm...@ualberta.ca>.
On 18 Mar 2010, at 0742h, Andy Kurth wrote:
> Hi Terry,
> Sorry for the delay. This information is helpful. You're right, the root cause seems to be that sysprep_cmdlines.cmd isn't running.
>
> I have seen the issue where you can't enter a password before. This only seems to happen for the newer style logon screen, not the classic logon screen. I'm not sure of the cause but you can get to the classic logon screen by pressing Ctrl-Alt-Del twice. Under the VMware console, press Ctrl-Alt-Insert twice.
Weirdly, sometimes I need the classic logon, sometimes I don't. Whatever. I use it when I need to.
> You can begin troubleshooting by examining C:\Windows\setuplog.txt file. There should be a few lines that look like the section I have copied to the end of this message. Search setuplog.txt for "sysprep_cmdlines.cmd". Does anything show up?
Yes, just like in your example, except it returns an exit code of 1. That's bad, right? Except, skipping ahead a bit, all is well when I apply the sysprep_cmdlines.cmd fix I figured out, exit code 0. Keep reading.
> Next, examine the Sysprep files. A copy of the same exact Sysprep files used when the image loaded should still be on the computer in C:\cygwin\home\root\VCL\Utilities\Sysprep. This directory is copied to C:\Sysprep before an image is captured. Sysprep automatically deletes C:\Sysprep when it finishes, so the VCL code copies everything to C:\cygwin\... and then makes an additional copy in C:\Sysprep so that the files are retained for troubleshooting.
Ah, that explains the InstallFilesPath=C:\sysprep\i386" bit.
> The "InstallFilesPath=C:\sysprep\i386" line is correct. Within the Sysprep directory, there should also be the following file:
> C:\cygwin\home\root\VCL\Utilities\Sysprep\i386\$oem$\cmdlines.txt
>
> This InstallFilesPath line in sysprep.inf causes cmdlines.txt to be processed during minisetup. Sysprep automatically calls the commands in cmdlines.txt before the computer boots Windows for the first time. You should see a call to sysprep_cmdlines.cmd in cmdlines.txt.
>
> So, make sure of the following:
> -cmdlines.txt resides in the location noted above
> -cmdlines.txt includes a line calling sysprep_cmdlines.cmd
This all looks good.
> You can troubleshoot this by manually running Sysprep. But first, load your image by making an imaging reservation (Manage Images -> Create/Update image) rather than a normal reservation. The reason for this is because VCL configures the VM to run in persistent mode for imaging reservations and nonpersistent mode for normal reservations. If the VM is running in nonpersistent mode and you reboot the machine, it will likely restart in the initial hard drive state saved in the .vmdk files rather than the state the VM before it was rebooted. If running in persistent mode, the VM's hard drive state is saved when it is rebooted.
>
> Manually run Sysprep:
> -Log in as root
> -Copy the entire Sysprep directory under C:\cygwin to C:\
> -Copy the entire C:\cygwin\home\root\VCL\Drivers directory to C:\Sysprep
> -Delete C:\cygwin\home\root\VCL\Logs to replicate the original state
> -Run the command: "C:\Sysprep\sysprep.exe /quiet /reseal /mini /reboot"
>
> You should see the computer reboot into the minisetup phase. Towards the end of this phase, you should see some black command boxes appear then close. This is when sysprep_cmdlines.cmd is being run. It should then reboot again and automatically log on as root.
Same results - autologin doesn't happen.
> If you don't see the black boxes during minisetup and it doesn't autologon, try manually running the command contained within cmdlines.txt after Sysprep is done:
> -Log in as root
> -Delete C:\cygwin\home\root\VCL\Logs
> -Open cmd.exe
> -Run this command (1 line):
> cmd.exe /c C:\Cygwin\home\root\VCL\Scripts\sysprep_cmdlines.cmd > C:\Cygwin\home\root\VCL\Logs\sysprep_cmdlines.log 2>&1
Logging in manually after running sysprep manually, I still don't have the log folder. Running sysprep_cmdlines manually, entered exactly as you have it here, I get "The system cannot find the path specified." Hmm. If I manually create a Logs dir, then rerun the command, stuff happens. Looking at the resulting sysprep_cmdlines.log file, it exits with a status of "0". Ok, I then restart. More promising looking things happen. Autologin, many black boxes, then autologout.
I did examine the permissions for root, and all looked good. I even applied the chmod as you suggested, but the problem persisted. I've now tweaked the image to have the Logs folder pre-created, and things seem to work fine. Autologin, many black boxes, and, when I make a reservation, it makes it all the way to "Connect!" without manual intervention. However, unfortunately, I still can't login with RDC.
Manually logging back in again via the console, and looking at the post_load log, I see it's exited with a 1. Looking more closely, configure_networking.vbs is exiting with an "errorlevel: 1". Looking at its log, I see this:
______________
configure_networking.vbs beginning to run: 3/23/2010 12:53:50 PM
Windows Version: 5.1.2600
---------------------------------------------------------------------------
12:54:26 PM
---------------------------------------------------------------------------
Printing routing table, command: cmd.exe /c %SystemRoot%\system32\route.exe print
===========================================================================
Interface List
0x1 ........................... MS TCP Loopback interface
0x2 ...00 50 56 00 00 00 ...... AMD PCNET Family PCI Ethernet Adapter #3 - Packet Scheduler Miniport
0x10004 ...00 50 56 00 00 01 ...... AMD PCNET Family PCI Ethernet Adapter #4 - Packet Scheduler Miniport
===========================================================================
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.0.1 192.168.1.1 30
127.0.0.0 255.0.0.0 127.0.0.1 127.0.0.1 1
169.254.0.0 255.255.0.0 169.254.237.166 169.254.237.166 30
169.254.237.166 255.255.255.255 127.0.0.1 127.0.0.1 30
169.254.255.255 255.255.255.255 169.254.237.166 169.254.237.166 30
192.168.0.0 255.255.0.0 192.168.1.1 192.168.1.1 30
192.168.1.1 255.255.255.255 127.0.0.1 127.0.0.1 30
192.168.1.255 255.255.255.255 192.168.1.1 192.168.1.1 30
224.0.0.0 240.0.0.0 169.254.237.166 169.254.237.166 30
224.0.0.0 240.0.0.0 192.168.1.1 192.168.1.1 30
255.255.255.255 255.255.255.255 169.254.237.166 169.254.237.166 1
255.255.255.255 255.255.255.255 192.168.1.1 192.168.1.1 1
Default Gateway: 192.168.0.1
===========================================================================
Persistent Routes:
None
Printing routing table successful, exit code: 0
12:54:27 PM
---------------------------------------------------------------------------
12:54:27 PM
---------------------------------------------------------------------------
Running ipconfig /all, command: cmd.exe /c %SystemRoot%\system32\ipconfig.exe /all
Windows IP Configuration
An internal error occurred: The file name is too long.
Please contact Microsoft Product Support Services for further help.
Additional information: Unable to query host name.
Running ipconfig /all successful, exit code: 0
12:54:28 PM
----------------------------------------------------------------------
*** AMD PCNET Family PCI Ethernet Adapter (Index: 1) ***
Adpater name: AMD PCNET Family PCI Ethernet Adapter
Ignored adpater name section:
Ignored adpater description section:
IP address: 129.128.9.119
Matching VCL private address section:
Matching non-public address section:
* PUBLIC_NAME = Local Area Connection
* DHCP enabled = True
* PUBLIC_IP = 129.128.9.119
* PUBLIC_SUBNET_MASK = 255.255.254.0
* PUBLIC_GATEWAY =
* PUBLIC_DESCRIPTION = AMD PCNET Family PCI Ethernet Adapter
----------------------------------------------------------------------
*** AMD PCNET Family PCI Ethernet Adapter (Index: 4) ***
Adpater name: AMD PCNET Family PCI Ethernet Adapter
Ignored adpater name section:
Ignored adpater description section:
IP address: 192.168.1.1
Matching VCL private address section:
Matching non-public address section: 192.168
IP address is not a public nor valid VCL private address: 192.168.1.1
---------------------------------------------------------------------------
12:54:29 PM
---------------------------------------------------------------------------
PRIVATE_NAME =
PRIVATE_IP =
PRIVATE_SUBNET_MASK =
PRIVATE_GATEWAY =
PUBLIC_NAME = Local Area Connection
PUBLIC_IP = 129.128.9.119
PUBLIC_SUBNET_MASK = 255.255.254.0
PUBLIC_GATEWAY =
Failed to retrieve private and public network configuration, returning exit status 1
_________________
Which looks bad to me. Poking around a bit, it seems that configure_networking.vbs expects my private lan to be 10.x.x.x, but I've got it as 192.168.x.x, as per other documentation. Is this relevant?
To summarize this round of glitch-squishing, the sysprep_cmdlines.cmd issue, with the symptom of the failure to autologin after running sysprep, seems to be solved, or at least worked around, by the pre-creation of the Logs folder. But the inability to connect via RDC after the reservation is made persists, which may be due to something going wrong with the configure_networking.vbs script.
Back to you, Andy (with continued gratefulness for your help).
Terry
Re: Base image capture failure
Posted by Andy Kurth <an...@ncsu.edu>.
Hi Terry,
Sorry for the delay. This information is helpful. You're right, the root cause
seems to be that sysprep_cmdlines.cmd isn't running.
I have seen the issue where you can't enter a password before. This only seems
to happen for the newer style logon screen, not the classic logon screen. I'm
not sure of the cause but you can get to the classic logon screen by pressing
Ctrl-Alt-Del twice. Under the VMware console, press Ctrl-Alt-Insert twice.
You can begin troubleshooting by examining C:\Windows\setuplog.txt file. There
should be a few lines that look like the section I have copied to the end of
this message. Search setuplog.txt for "sysprep_cmdlines.cmd". Does anything
show up?
Next, examine the Sysprep files. A copy of the same exact Sysprep files used
when the image loaded should still be on the computer in
C:\cygwin\home\root\VCL\Utilities\Sysprep. This directory is copied to
C:\Sysprep before an image is captured. Sysprep automatically deletes
C:\Sysprep when it finishes, so the VCL code copies everything to C:\cygwin\...
and then makes an additional copy in C:\Sysprep so that the files are retained
for troubleshooting.
The "InstallFilesPath=C:\sysprep\i386" line is correct. Within the Sysprep
directory, there should also be the following file:
C:\cygwin\home\root\VCL\Utilities\Sysprep\i386\$oem$\cmdlines.txt
This InstallFilesPath line in sysprep.inf causes cmdlines.txt to be processed
during minisetup. Sysprep automatically calls the commands in cmdlines.txt
before the computer boots Windows for the first time. You should see a call to
sysprep_cmdlines.cmd in cmdlines.txt.
So, make sure of the following:
-cmdlines.txt resides in the location noted above
-cmdlines.txt includes a line calling sysprep_cmdlines.cmd
You can troubleshoot this by manually running Sysprep. But first, load your
image by making an imaging reservation (Manage Images -> Create/Update image)
rather than a normal reservation. The reason for this is because VCL configures
the VM to run in persistent mode for imaging reservations and nonpersistent mode
for normal reservations. If the VM is running in nonpersistent mode and you
reboot the machine, it will likely restart in the initial hard drive state saved
in the .vmdk files rather than the state the VM before it was rebooted. If
running in persistent mode, the VM's hard drive state is saved when it is rebooted.
Manually run Sysprep:
-Log in as root
-Copy the entire Sysprep directory under C:\cygwin to C:\
-Copy the entire C:\cygwin\home\root\VCL\Drivers directory to C:\Sysprep
-Delete C:\cygwin\home\root\VCL\Logs to replicate the original state
-Run the command: "C:\Sysprep\sysprep.exe /quiet /reseal /mini /reboot"
You should see the computer reboot into the minisetup phase. Towards the end of
this phase, you should see some black command boxes appear then close. This is
when sysprep_cmdlines.cmd is being run. It should then reboot again and
automatically log on as root.
If you don't see the black boxes during minisetup and it doesn't autologon, try
manually running the command contained within cmdlines.txt after Sysprep is done:
-Log in as root
-Delete C:\cygwin\home\root\VCL\Logs
-Open cmd.exe
-Run this command (1 line):
cmd.exe /c C:\Cygwin\home\root\VCL\Scripts\sysprep_cmdlines.cmd >
C:\Cygwin\home\root\VCL\Logs\sysprep_cmdlines.log 2>&1
If things still aren't working, I'm wondering if it could be a permissions
problem. Make sure root is the owner of its home directory:
-Log in as root
-Open a Cygwin shell
-Run: "chown -R root:Administrators ~/"
-Try running Sysprep again
Hope this helps,
Andy
****************
setuplog.txt section showing where sysprep_cmdlines.cmd was run:
03/12/2010
15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,SetUpVirtualMemory: loc 1
03/12/2010 15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,Setup
configured the system to place a 384 MB pagefile on drive C:.
03/12/2010 15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,Crashdump was
enabled.
03/12/2010
15:08:14.155,d:\xpsp\base\ntsetup\syssetup\log.c,133,,SetUpVirtualMemory: EXIT (1)
03/12/2010 15:08:14.155,d:\xpsp\base\ntsetup\syssetup\syssetup.c,2725,,Power
scheme: desktop.
03/12/2010
15:08:14.155,d:\xpsp\base\ntsetup\syssetup\syssetup.c,2729,,SetActivePwrScheme
succeeded.
03/12/2010 15:08:21.343,d:\xpsp\base\ntsetup\syssetup\log.c,133,,The external
program cmd.exe /c C:\Cygwin\home\root\VCL\Scripts\sysprep_cmdlines.cmd >
C:\Cygwin\home\root\VCL\Logs\sysprep_cmdlines.log 2>&1 returned exit code 0.
03/12/2010
15:08:21.343,d:\xpsp\base\ntsetup\syssetup\syssetup.c,4034,BEGIN_SECTION,Fixing
up hives
03/12/2010
15:08:21.593,d:\xpsp\base\ntsetup\syssetup\syssetup.c,4041,END_SECTION,Fixing up
hives
Re: Base image capture failure
Posted by Terry McGuire <tm...@ualberta.ca>.
Hi Andy. First, let me thank you once again for sticking with this ongoing saga. Next time I'm in North Carolina, I'll buy you a beer!
On 2 Mar 2010, at 0831h, Andy Kurth wrote:
> You will need to watch the VM console after the VM is turned on in order to troubleshoot this. You should see the following:
>
> -VM is turned on
> -Sysprep minisetup runs, VM is rebooted
> -When Windows boots up for the first time, the root account is automatically logged on
> -A few black command boxes appear on the desktop, the one in the back is named post_load.cmd
> -When the command boxes close, root is logged off
> -At this point, the computer should respond to SSH
I see the VM turn on, sysprep runs, VM reboots, but then after Windows loads it just stays at the login window.
> You should be able to log on as root via the VMware console. The password should be the one configured as WINDOWS_ROOT_PASSWORD /etc/vcl/vcld.conf. After logging in, view the log files generated by the VCL scripts. All of the output generated by the scripts gets saved into files in C:\cygwin\home\root\VCL\Logs.
I can indeed login as root via the console, with the password I put in vcld.conf. However, there is no log folder in C:\cygwin\home\root\vcl - just Drivers, Scripts, Security and Utilities. Which, after poking around a bit, means that post_load has not run, yes?
If I run post_load manually, everything seems to move along nicely, but after it logs out, I can no longer login as root. It doesn't let me type a password, even though it's asking for one.
Interestingly, if I make a reservation at this point, the reservation appears to be set up properly, and is acknowledged through the web interface, but when I try to login via RDC, it fails, acting like there's no machine to talk to. (And, yes, I'm trying to connect from the same machine I clicked the "Connect!" button on.) The vcld log suggests all is well (there's lots of log, as you'd know - let me know if I should send you any of it.)
Trying to ssh or ping the vm on its public address fails, but that might be normal, yes?
> The troubleshooting steps depend largely on whether or not you see root being automatically logged on.
>
> If root is not logged on automatically, the problem can probably be found in sysprep_cmdlines.log and the files in Logs\sysprep_cmdlines directory. These files are generated during the Sysprep minisetup stage when Scripts\sysprep_cmdlines.cmd runs. This script configures root's autologon and sets a registry key to cause Scripts\post_load.cmd to run after root is automatically logged on.
> If it's attempting to log on root but failing because of a credentials problem, the cause could be that the password was not correctly configured in Scripts\autologon_enable.cmd. Check the "set PASSWORD=" line in this file.
The autologin_enable script has the correct password, and when I run it then restart, autologin works.
> If root is being logged on, first check if the Cygwin SSHD service is running and if the firewall has an exception for TCP port 22. Be sure to check both the middle "Exceptions" tab and the settings for each adapter under the "Advanced" tab for the exception. My guess is that SSHD failed to start. The problem can probably be found in Logs\post_load.log and in the files in the Logs\post_load directory. Check Logs\update_cygwin.cmd for errors.
>
> As you'll see in the log files, there's a lot that has to happen in order for everything to work correctly. The output from the log files will be helpful in order to figure this out.
Poking around in the scripts folder, I see that this whole post-load series of events is contingent on Sysprep running sysprep_cmdlines, which perhaps it's not doing. Does it matter that the sysprep.inf file includes "InstallFilesPath=C:\sysprep\i386"? This doesn't seem right to me, but to change it I'd need to alter the base image, which frightens me. So, I'll await your reply before trying anything that crazy.
Terry