You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@vcl.apache.org by Henry E Schaffer <he...@unity.ncsu.edu> on 2009/04/27 16:33:30 UTC

Getting Started - 6

  There are other chores involved in getting even a small starter VCL
system up to speed.

  Downloading, installing, and learning how to use xCAT for loading
images on "bare metal" blades has to be done.  (I think it still must be
obtained from the xCAT site http://xcat.sourceforge.net/ rather than
being distributed through the Apache Foundation site.)  

  The VCL is based on the common "LAMP" software environment - Linux,
Apache, MySQL, PHP, Perl.  If you don't already work in this
environment, there will be a somewhat longer learning curve.  The
current VCL code uses MySQL for its operational and historical data
storage.  If this is not the RDBMS which you already use, there might be
a small learning curve, or some mods will be required to substitute a
different SQL database. 

  Mess around with your hypervisor of choice, e.g. VMware, Xen, ..., if
you are using one.  (We primarily use VMware when we want to use a
hypervisor.)  If you don't care to use a hypervisor initially, or ever,
then don't. "Bare metal" blades work very well, and are the setup of
choice for apps which will use up most or all of the cpu's capabilities.
However, with the increasing availability and popularity of multicore
processors, it is likely that hypervising is in your future.  But it
certainly doesn't have to be done in the initial stages of operation.

  Interface the VCL with your campus enterprise storage so users can
have their allocated storage visible on their VCL session. (That's
usually easily done by installation of the appropriate client on the
base image.)  Make your own web page "skin" for your institution, it
might as well reflect your institution's look and feel - or multiple
pages for the look and feel of each institution/organization in your
collaborative.

  All of this is needed to get the starter VCL system ready to use.  So
next we'll look at getting started in the realm of use.
-- 
--henry schaffer

Re: Getting Started - 8

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

Josh writes:
> Actually, all resources allocated to a block reservation remain allocated to 
> it for the duration of that time period.  So, if an instructor needs to 
> lecture for 20 minutes or so at the start of class, the computers will still 
> be available for reservations.  If it were a special situation where the 
> class met for something like 3 hours in a row, we'd work with the instructor 
> to try to get a more limited time during which the students would be using 
> VCL.

  Thanks for the clarification.  I think this further illustrates my
point that, "I can't overemphasize how important it is to develop a good
working relationship with the instructors who use the VCL in their
classes!"

--henry schaffer

Re: Getting Started - 8

Posted by Josh Thompson <jo...@ncsu.edu>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday April 29, 2009, Henry E Schaffer wrote:
> Remember that any unused imaged computer will be released
> from the block reservation after 15 minutes, so this doesn't tie up much
> hardware and it does minimize the stress on the instructor.

Actually, all resources allocated to a block reservation remain allocated to 
it for the duration of that time period.  So, if an instructor needs to 
lecture for 20 minutes or so at the start of class, the computers will still 
be available for reservations.  If it were a special situation where the 
class met for something like 3 hours in a row, we'd work with the instructor 
to try to get a more limited time during which the students would be using 
VCL.

Josh
- -- 
- -------------------------------
Josh Thompson
Systems Programmer
Virtual Computing Lab (VCL)
North Carolina State University

Josh_Thompson@ncsu.edu
919-515-5323

my GPG/PGP key can be found at pgp.mit.edu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFJ+LcNV/LQcNdtPQMRAipSAJ47ncUqsnZZCwwxs119u27j00yfCACeP1PL
ejlkndb0xSmYuwvtiEBAB8U=
=oREc
-----END PGP SIGNATURE-----

Getting Started - 12

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

  Let's return to the "getting started" theme by looking at
specifications for the first blade server chassis and blades which can
be used as the basis for learning, experimentation, testing and initial
production pilots.

  Sample Specs for that first chassis follow - but please start by
trying very hard to do this collaboratively!  I've obtained these specs
from my friend and colleague Eric Sills.

  These refer to the IBM BladeCenter 
http://www-03.ibm.com/systems/bladecenter/
products which we started using for our campus distributed memory
parallel HPC computing facility and then put into service to run the
VCL's "Desktop Augmentation" service.

  Following the concept of "scaling up", we already knew this hardware
product, we were already using IBM's xCAT software (mentioned before), 
and so it made a lot of sense to continue with this same base 
infrastructure.  We've continued to be very satisfied with this choice
for a number of reasons including capability, density, power efficiency
and reliability.

one BladeCenter E chassis

chassis power supplies 3&4 (if using more than 6 blade servers)

two chassis Ethernet switch modules (we use BNT layer 2/3 copper
switches)

one (or two) chassis IO module(s) to directly attach blade used as VCL
management node to storage - may be SAS, iSCSI, of Fiber Channel (we
have used both FC and iSCSI with optical or copper pass through modules)

four to fourteen blade servers We have been using Intel blades (HS2x)
with dual Xeon processors (most recently quad-core), about 2GB
memory/core, and a SAS disk drive. For running VMs on hypervisor you
may want to choose one of the larger size disks but for bare metal
loads a 73GB disk will be more than sufficient.

This system needs to be rack mounted and will need 208VAC power. The E
chassis has four power supplies with cords that connect to C19 outlets -
so you will need a rack PDU with at least four C19 outlets to connect
the chassis.  We don't totally fill up each rack, in part to limit power
density in the room.  Also there may be a need to put some other devices
in the rack.

  As scaling up is done, serious attention should be paid to the machine
room design.  Good hot/cold aisle design setups will help improve
cooling and reduce the power required for cooling.  Improved cooling
also extends the hardware life. There are also some very interesting and
effective cooling products which attach on the back of the racks or 
inbetween the racks and which have the promise of being more effective 
than hot/cold aisles.
-- 
--henry schaffer

Getting Started - 11

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

  Economies of scale are inherent in the VCL and in Cloud Computing
generally.  To benefit from them requires scaling up! :-)  These posts
have emphasized productive ways to get to the point of scaling up.

  During our early experience one time after adding 100 blades to our
VCL/HPC system we checked on the increase of our work load.  It went up
by 2-3 hours per month.  This was very gratifying as our experience with
the standard individual desk top machines in student labs was that two
additional labs of 50 machines each required about an extra 1 FTE of
staff time.  However, as we've continued to scale up with another 1,000
blades we find the incremental work load per additional 100 blades to be
considerably smaller than that. How little?  I'm not sure, but the same
staff we had in the beginning is still in place.

  This emphasizes the value of running large VCL installations
collaboratively.  This can be stated, as one of my colleagues recently
wrote, that in a small VCL installation personnel costs are a major cost
factor.  At scale, the personnel costs become a minor cost factor.

  Have I mentioned the pay off of approaching this project
incrementally? :-)  The benefits are great, the options you have stay
open longer, and the end results are better.

  The next post will return to the original "getting started" theme by
giving an example of specifications for the first blade server chassis
and blades which can be used as the basis for learning, experimentation,
testing and initial production pilots.
-- 
--henry schaffer

Getting Started - 10

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

  We've reviewed a laundry list of preparations required before going
w long will it take to do all of that - and whatever else more that
I haven't listed?  I'm not sure, but it's likely measured in months
rather than weeks.

  If you started with one chassis, then you can expand, gaining more
experience, and add more hardware.  Other than inter-chassis switching,
there isn't much new.  You'll have to add (assign) another Management
Node when you get past 100 blades or so.  You'll find that the labor to
run this expanded system hardly increases as you scale up - other than
the labor involved in the blade/chassis/rack installations.

  Alternatively, you can assess whether the VCL approach is the right
one for you.  I may be sure it will be, but it's better for you to do
your evaluation and make your decisions based on your situation.

  If you've started with one chassis rather than the 10, 20 or 50 you
really wanted to buy to make a major impact, you've realized many
important benefits.  One significant one is that, while you've been
learning and experimenting, the other blades haven't been using up their
warranty time, haven't been getting older, and will come in brand new
and shiny when they are delivered.

  You'll find it relatively easy to scale up.  There is the nuisance of
assembling blades, racks, etc.  It's kind of similar to that Erector Set
you had when you were a kid (or did you start with a Heathkit
Electronics setup?) - so assemble everything - including providing all
the power and cooling needed (by the way, we've found that the IBM Cool
Blue Rear Doors help with respect to machine room cooling, and also with
respect to the energy bill. This shouldn't be a surprise if you paid
attention to that long-ago thermo course. :-) There are a number of
products which improve the thermodynamics of cooling vs. the traditional
hot aisle - cold aisle. There will also be the need to add network
switches to allow inter-chassis networking and and you'll have to
configure the switches. After those initial efforts, you'll find that
the additional effort to keep the system running is very small.

  These are economies of scale, and they are more easily realized by
doing the scaling after the startup system has been mastered.
-- 
--henry schaffer

Getting Started - 9

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

  Now for another major aspect to the VCL, one which can add a great
amount of value.

  Do you want to do distributed memory parallel HPC (high performance
computing) on this system?  (This is now the major way of providing 
computational science and other HPC capabilities.) This has worked 
*extremely* well at NC States (see some of our documents describing our
total system - our HPC web site is at hpc://hpc.ncsu.edu/ ) in greatly
extending the use of the blades (e.g. when our class use almost
disappears at the end of the semester, the TAs and faculty ramp up their
research use.)  This additional use provides a substantial uptick in the
overall economy of the VCL.  Even if your institution doesn't need this
service, you might be collaborating with another institution, or
several, which do - and so the sharing can significantly enhance the
economics.  (My personal view is that computational science should be
very prevalent in graduate work in the STEM fields, and be common in
undergraduate studies.)

  At NC State we take a university-wide view of TCO - Total Cost of
Ownership.)  Having the teaching and research areas collaborate to save
overall budget is seen as a very desirable outcome.  We have found that
many universities separate budgeting into unconnected teaching and
research funding, and so don't seem to have the same motivation to
economize over the entire university budget.

  An economical and wise choice is to include both HPC and "desktop
augmentation" (that's a good term to cover class laboratory, homework,
and all individual uses of single images) in your VCL planning.  If you
do include the HPC capabilities, there's a need to choose, install and
learn the HPC workload management software you choose.  NC State has 
been using LSF (http://www.platform.com/Products/platform-lsf/). Others
such as Rocks (http://www.rocksclusters.org/) should also work.

  This is the end of my laundry list of preparations required before
going into full production.
-- 
--henry schaffer

Getting Started - 8

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

  Scaling up to wider production adds to the scope of the project.

  There are also likely to be some cultural issues which come with
increasing production, particularly with scheduled "block" reservations.
Our experience is that this is a very sensitive point, and is the area
where resources first run out.  For a class block session, there should
be a set of images, authorized for the class roll, sitting there ready
to be logged into during the first minute of the class.  That means that
the imaging for that group of computers must be started well before the
time of the class.  We typically start the process 30 minutes in advance
for loading bare metal blades, to make sure that they are all ready.  We
also start up the process on 1 or 2 more blades than were reserved -
just to help ensure that a glitch, such as an extra boot, doesn't hinder
the class.  Remember that any unused imaged computer will be released
from the block reservation after 15 minutes, so this doesn't tie up much
hardware and it does minimize the stress on the instructor.  (I can't
overemphasize how important it is to develop a good working relationship
with the instructors who use the VCL in their classes!)

  Ad hoc use doesn't have this stress.  If the requested image isn't
available at the time of request, there is a delay while the blade is
reimaged - but it doesn't delay a class.  The (individual) user can use
that time to check e-mail, etc., and then use the image when it becomes
available as is indicated on the Connect! web page.

  Next, on to another "face" of the VCL which plays a very important
role in the overall economic efficiency of this system.
-- 
--henry schaffer

Getting Started - 7

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.

  Getting started in your realm of use involves some very important
effort which can make the difference between success and failure.  This
is not technically oriented effort and so might be overlooked in the
flurry of hardware acquisition/installation, software downloading and
familiarization and the rest of all that fun!

  At this point you have the starter VCL system running, and your
internal systems staff has done a thorough job of using it in test mode.
Now is the time to go outside to the rest of the campus. Train some 
early faculty adopters and teaching support staff.  Have them try out
the VCL with their students - work with them and smooth out any rough
edges in the training and/or the campus culture. Getting them used to
the VCL and getting used to helping users use the VCL productively are
not technical systems chores, but are very important.  

  You may find some surprises. For example, people who haven't run a
remote system before (whether through RDP or X) may be taken aback by
the need to possibly install and then use the client software on their
own desktop or laptop. Even though most computers come with either RDP
or X, very few, if any, come with both. Many users have used neither.
It doesn't take a lot of effort to bring them up to speed, but 
neglecting them can cause them to have unhappy experiences and be lost
as users and supporters. Look through the past posts on this list for
some "gotchas" in this area.

  Now you can move to real production - and that's just scaling up what
you already have running.  Scaling up the hardware is straight forward,
although the networking does increase in complexity with the need for
additional outside-the-chassis switches to allow inter-chassis
communication.

  There are more issues which arise with scaling up into production
mode - to be discussed next.
-- 
--henry schaffer