You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@vcl.apache.org by Henry E Schaffer <he...@unity.ncsu.edu> on 2009/04/30 16:08:33 UTC

Getting Started - 9

  Now for another major aspect to the VCL, one which can add a great
amount of value.

  Do you want to do distributed memory parallel HPC (high performance
computing) on this system?  (This is now the major way of providing 
computational science and other HPC capabilities.) This has worked 
*extremely* well at NC States (see some of our documents describing our
total system - our HPC web site is at hpc://hpc.ncsu.edu/ ) in greatly
extending the use of the blades (e.g. when our class use almost
disappears at the end of the semester, the TAs and faculty ramp up their
research use.)  This additional use provides a substantial uptick in the
overall economy of the VCL.  Even if your institution doesn't need this
service, you might be collaborating with another institution, or
several, which do - and so the sharing can significantly enhance the
economics.  (My personal view is that computational science should be
very prevalent in graduate work in the STEM fields, and be common in
undergraduate studies.)

  At NC State we take a university-wide view of TCO - Total Cost of
Ownership.)  Having the teaching and research areas collaborate to save
overall budget is seen as a very desirable outcome.  We have found that
many universities separate budgeting into unconnected teaching and
research funding, and so don't seem to have the same motivation to
economize over the entire university budget.

  An economical and wise choice is to include both HPC and "desktop
augmentation" (that's a good term to cover class laboratory, homework,
and all individual uses of single images) in your VCL planning.  If you
do include the HPC capabilities, there's a need to choose, install and
learn the HPC workload management software you choose.  NC State has 
been using LSF (http://www.platform.com/Products/platform-lsf/). Others
such as Rocks (http://www.rocksclusters.org/) should also work.

  This is the end of my laundry list of preparations required before
going into full production.
-- 
--henry schaffer

Getting Started - 12

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.
  Let's return to the "getting started" theme by looking at
specifications for the first blade server chassis and blades which can
be used as the basis for learning, experimentation, testing and initial
production pilots.

  Sample Specs for that first chassis follow - but please start by
trying very hard to do this collaboratively!  I've obtained these specs
from my friend and colleague Eric Sills.

  These refer to the IBM BladeCenter 
http://www-03.ibm.com/systems/bladecenter/
products which we started using for our campus distributed memory
parallel HPC computing facility and then put into service to run the
VCL's "Desktop Augmentation" service.

  Following the concept of "scaling up", we already knew this hardware
product, we were already using IBM's xCAT software (mentioned before), 
and so it made a lot of sense to continue with this same base 
infrastructure.  We've continued to be very satisfied with this choice
for a number of reasons including capability, density, power efficiency
and reliability.

one BladeCenter E chassis

chassis power supplies 3&4 (if using more than 6 blade servers)

two chassis Ethernet switch modules (we use BNT layer 2/3 copper
switches)

one (or two) chassis IO module(s) to directly attach blade used as VCL
management node to storage - may be SAS, iSCSI, of Fiber Channel (we
have used both FC and iSCSI with optical or copper pass through modules)

four to fourteen blade servers We have been using Intel blades (HS2x)
with dual Xeon processors (most recently quad-core), about 2GB
memory/core, and a SAS disk drive. For running VMs on hypervisor you
may want to choose one of the larger size disks but for bare metal
loads a 73GB disk will be more than sufficient.

This system needs to be rack mounted and will need 208VAC power. The E
chassis has four power supplies with cords that connect to C19 outlets -
so you will need a rack PDU with at least four C19 outlets to connect
the chassis.  We don't totally fill up each rack, in part to limit power
density in the room.  Also there may be a need to put some other devices
in the rack.

  As scaling up is done, serious attention should be paid to the machine
room design.  Good hot/cold aisle design setups will help improve
cooling and reduce the power required for cooling.  Improved cooling
also extends the hardware life. There are also some very interesting and
effective cooling products which attach on the back of the racks or 
inbetween the racks and which have the promise of being more effective 
than hot/cold aisles.
-- 
--henry schaffer

Getting Started - 11

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.
  Economies of scale are inherent in the VCL and in Cloud Computing
generally.  To benefit from them requires scaling up! :-)  These posts
have emphasized productive ways to get to the point of scaling up.

  During our early experience one time after adding 100 blades to our
VCL/HPC system we checked on the increase of our work load.  It went up
by 2-3 hours per month.  This was very gratifying as our experience with
the standard individual desk top machines in student labs was that two
additional labs of 50 machines each required about an extra 1 FTE of
staff time.  However, as we've continued to scale up with another 1,000
blades we find the incremental work load per additional 100 blades to be
considerably smaller than that. How little?  I'm not sure, but the same
staff we had in the beginning is still in place.

  This emphasizes the value of running large VCL installations
collaboratively.  This can be stated, as one of my colleagues recently
wrote, that in a small VCL installation personnel costs are a major cost
factor.  At scale, the personnel costs become a minor cost factor.

  Have I mentioned the pay off of approaching this project
incrementally? :-)  The benefits are great, the options you have stay
open longer, and the end results are better.

  The next post will return to the original "getting started" theme by
giving an example of specifications for the first blade server chassis
and blades which can be used as the basis for learning, experimentation,
testing and initial production pilots.
-- 
--henry schaffer

Getting Started - 10

Posted by Henry E Schaffer <he...@unity.ncsu.edu>.
  We've reviewed a laundry list of preparations required before going
w long will it take to do all of that - and whatever else more that
I haven't listed?  I'm not sure, but it's likely measured in months
rather than weeks.

  If you started with one chassis, then you can expand, gaining more
experience, and add more hardware.  Other than inter-chassis switching,
there isn't much new.  You'll have to add (assign) another Management
Node when you get past 100 blades or so.  You'll find that the labor to
run this expanded system hardly increases as you scale up - other than
the labor involved in the blade/chassis/rack installations.

  Alternatively, you can assess whether the VCL approach is the right
one for you.  I may be sure it will be, but it's better for you to do
your evaluation and make your decisions based on your situation.

  If you've started with one chassis rather than the 10, 20 or 50 you
really wanted to buy to make a major impact, you've realized many
important benefits.  One significant one is that, while you've been
learning and experimenting, the other blades haven't been using up their
warranty time, haven't been getting older, and will come in brand new
and shiny when they are delivered.

  You'll find it relatively easy to scale up.  There is the nuisance of
assembling blades, racks, etc.  It's kind of similar to that Erector Set
you had when you were a kid (or did you start with a Heathkit
Electronics setup?) - so assemble everything - including providing all
the power and cooling needed (by the way, we've found that the IBM Cool
Blue Rear Doors help with respect to machine room cooling, and also with
respect to the energy bill. This shouldn't be a surprise if you paid
attention to that long-ago thermo course. :-) There are a number of
products which improve the thermodynamics of cooling vs. the traditional
hot aisle - cold aisle. There will also be the need to add network
switches to allow inter-chassis networking and and you'll have to
configure the switches. After those initial efforts, you'll find that
the additional effort to keep the system running is very small.

  These are economies of scale, and they are more easily realized by
doing the scaling after the startup system has been mastered.
-- 
--henry schaffer