You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Thomas Harold <tg...@tgharold.com> on 2006/11/28 00:27:52 UTC

Repository storage question (RAID)

Semi-off-topic / semi-on-topic...

I'm getting ready to bulk out our storage for our repositories (and 
there's other stuff running in the background, but that happens during 
off-peak hours).

Would it be better to go with a 4-disk RAID10 made up of 750GB SATA 
drives, or an 8-disk RAID10 made up of 320GB or 400GB SATA drives?  Do 
the extra spindles gain us enough to make the power increase worth it?

(This would all be Software RAID in Linux 2.6 done on a PCIe box with 
plenty of I/O bandwidth.)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository storage question (RAID)

Posted by Les Mikesell <le...@gmail.com>.
On Mon, 2006-11-27 at 22:24, Thomas Harold wrote:

> I wasn't sure whether the additional spindles would drive random access 
> times down enough to be worth the extra drive bays.  The issue isn't so 
> much the SVN access, it's everything else that also lives on that set of 
> RAID10 drives (SVN is just one of the multiple Xen DomUs running).
>
> Or whether I need to suck it up and move to a faster RPM drive (driving 
> our costs up).  In which case I would probably merely break the heavy 
> disk activities out to a 2nd, dedicated box with SCSI/SAS/SATA 10k disks.

I'd max out the RAM in the box before spending a lot on faster
drives.  Avoiding i/o and seeks is even better than speeding
them up.

-- 
  Les Mikesell
   lesmikesell@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository storage question (RAID)

Posted by Thomas Harold <tg...@tgharold.com>.
Talden wrote:
> You'll get more redundancy in the 8 disk solution for though you're
> proportionately increasing the chance of drive failure across the set
> you're reducing the volume of data exposed to a failure of two drives
> in the same pair. So 8 drives gives more redundancy due to reduced
> data per drive.
> 
> But don't quote me, I'm no statistician...

With the 4-disk RAID10 we have now, we have a single hot-spare.  With 
8-disk, we would dedicate 2 hot-spares.  I'm not a statistician either, 
so it ends up boiled down whether a 2nd drive in the same set might fail 
before the hot-spare can be spun-up and synchronized.

In RAID10, that time is directly related to the sequential read/write 
speed of a single drive in the array combined with the size of the 
individual drives.  I know that rebuild times for a pairing of 750GB 
units is around 3 hours (give or take a bit).  Not sure what rebuild 
time for a 320GB/400GB drive would be.  But I'd expect a number that is 
also in the 3 hour range.

Or I can go one step more paranoid and build the RAID10 array by hand 
with (3) drives in each mirror set.  All 3 drives in each RAID1 slice 
can be active with a RAID0 layer laid over top of the RAID1 slices.  You 
then gain additional reliability at the cost of only 33% net capacity 
and a more complex recovery structure (you may have to script mdadm 
hot-spare rebuilding unless mdadm can share hot-spare areas between 
multiple arrays).

But I'm not anywhere near that paranoid and will instead rely on backups 
for restoration of the data.

> I wouldn't think the extra performance is going to produce a
> significant improvement.  Unless most of your files are large enough
> to harness the transfer rate the varying seek times and controller
> overhead will likely suck up the gains.  This is all assuming you can
> even compute and transmit the data to/from the client quickly enough
> to get any improvement.

I wasn't sure whether the additional spindles would drive random access 
times down enough to be worth the extra drive bays.  The issue isn't so 
much the SVN access, it's everything else that also lives on that set of 
RAID10 drives (SVN is just one of the multiple Xen DomUs running).

Or whether I need to suck it up and move to a faster RPM drive (driving 
our costs up).  In which case I would probably merely break the heavy 
disk activities out to a 2nd, dedicated box with SCSI/SAS/SATA 10k disks.

> You're also probably near to saturating the IO channel even with 4 drives.

I've run a 6-disk RAID10 array with 750GB SATA drives that is (according 
to bonnie++ and a 16GB test area) capable of 192MB/s for sequential 
reads. I don't recall what CPU usage was (it was probably tying up most 
of the 2nd CPU core though).  I've even seen burst numbers in the 
220MB/s range.  Under load, I've seen the 6-disk set drop to around 
20-25MB/s.  Which is still a decent performance level.

PCIe chipsets definitely have a lot better usable bandwidth then PCI 
did.  Quad-core CPUs next year will help mitigate the CPU utilization issue.

Hardware RAID is nice (lower CPU usage, lower bus I/O traffic) but when 
a motherboard chipset has 6 or 7 SATA ports, it's very nice to use them 
in conjunction with Software RAID to reduce startup costs.  Software 
RAID also has a larger comfort factor as I only need to have access to 
"N" SATA ports in order to access my data with absolutely no ties to a 
particular controller or configuration (or kernel revision or driver 
revision).

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository storage question (RAID)

Posted by Talden <ta...@gmail.com>.
You'll get more redundancy in the 8 disk solution for though you're
proportionately increasing the chance of drive failure across the set
you're reducing the volume of data exposed to a failure of two drives
in the same pair. So 8 drives gives more redundancy due to reduced
data per drive.

But don't quote me, I'm no statistician...

I wouldn't think the extra performance is going to produce a
significant improvement.  Unless most of your files are large enough
to harness the transfer rate the varying seek times and controller
overhead will likely suck up the gains.  This is all assuming you can
even compute and transmit the data to/from the client quickly enough
to get any improvement.

You're also probably near to saturating the IO channel even with 4 drives.


What kinds of infrastructure are other people running their Subversion
on and what are the stats of your repositories?  It would be good to
get a feel for what kinds of use Subversion has been put to and on
what hardware.

EG. Extrapolating out our current CVS usage into what I expect for the
production subversion repository (when we get there) would be
something like:

  Legacy/unmigrated projects
  20,000 files, 3,500 folders, <500 commits per year.

  Main
  18,000 files, 5,500 folder, 8,000-10,000 commits per year.

3 teams of 10 devs each will hit the Main repository 99% of the time
and the 3 teams are rarely concurrent as we are spread around the
globe 6-8 hours apart each. One team is local (LAN) to the server.
High-latency, low bandwidth sucks big-time with CVS for the rest of
us...

CVS is currently on 4 disk RAID10 on a lowish spec dual CPU server box.

--
Talden


On 28/11/06, Thomas Harold <tg...@tgharold.com> wrote:
> Semi-off-topic / semi-on-topic...
>
> I'm getting ready to bulk out our storage for our repositories (and
> there's other stuff running in the background, but that happens during
> off-peak hours).
>
> Would it be better to go with a 4-disk RAID10 made up of 750GB SATA
> drives, or an 8-disk RAID10 made up of 320GB or 400GB SATA drives?  Do
> the extra spindles gain us enough to make the power increase worth it?
>
> (This would all be Software RAID in Linux 2.6 done on a PCIe box with
> plenty of I/O bandwidth.)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org