You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@red-bean.com> on 2009/06/23 19:28:24 UTC

libsvn_fs_bigtable and general svn problems

Hi folks,

As you may have noticed, Google Code's Subversion service is still
pretty slow... still much slower than a stock Subversion/Apache server
running on a single box.  It turns out to be tricky to work with
bigtable:  you get massive scalability, but in return you have to
convert all of BDB's disk i/o calls into network RPCs.  On a single
box, the disk i/o calls get faster over time as the OS eventually ends
up swapping the underlying filesystem into memory.  But network RPCs
are slow and stay slow.  :-/

If you haven't yet heard the rumors, it's true: Jon Trowbridge is now
working on re-optimizing our libsvn_fs_bigtable library to be much
faster.  This not only means some major bigtable schema changes, but
evolving our filesystem implementation to no longer be just a simple
fork of libsvn_fs_base.  We're making some really huge changes to our
implementation, and Jon has started encountering quite a number of
problems in Subversion's core code and design.  When I say "problems",
what I really mean is "lots of hidden assumptions" about things that
somebody doing a clean-room implementation of svn_fs.h would run into.
 Some of these things are simply brittle tests or poor documentation
in svn_fs.h, but others seems to be some actual bugs that we've been
lucky to avoid just because libsvn_fs_{base|fs} just happen to do
things a certain way!

In any case, I've asked Jon to start posting to this list when he
encounters such things, and I wanted to give people some context for
his postings.

Honestly, I'm pretty excited by this feedback.  It's rare when we get
such an incredibly fresh set of eyes on our code!  We strive to make
our docs and APIs as open as possible, so I'm really keen on hearing
when they're not working correctly for newcomers.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2364646

Re: libsvn_fs_bigtable and general svn problems

Posted by Blair Zajac <bl...@orcaware.com>.
Ben Collins-Sussman wrote:
> On Mon, Aug 17, 2009 at 5:24 PM, Blair Zajac<bl...@orcaware.com> wrote:
>> Ben Collins-Sussman wrote:
>>> Hi folks,
>>>
>>> As you may have noticed, Google Code's Subversion service is still
>>> pretty slow... still much slower than a stock Subversion/Apache server
>>> running on a single box.  It turns out to be tricky to work with
>>> bigtable:  you get massive scalability, but in return you have to
>>> convert all of BDB's disk i/o calls into network RPCs.  On a single
>>> box, the disk i/o calls get faster over time as the OS eventually ends
>>> up swapping the underlying filesystem into memory.  But network RPCs
>>> are slow and stay slow.  :-/
>> Ben,
>>
>> Are there any writeups on the specifics of the svn_fs.h to BigTable mapping?
>> How are the paths and node-ids mapped to BigTable's key and columns?
> 
> The original port that fitz and I did was fairly brain-dead:  we
> simply forked libsvn_fs_base, and replaced BDB calls with Bigtable
> calls.  Instead of BDB managing 10 hashtables on disk, we now had
> Bigtable managing the same 10 "columns" in a single Bigtable.
> 
> It certainly worked, but it was heinously slow.  Our whole BDB backend
> assumes that that reads & writes are essentially free.  Sure enough,
> any reasonable OS will eventually page the BDB files directly into
> memory and then access *is* essentially free.  However, by converting
> these BDB calls to Bigtable network RPCs, we experienced a 10x
> slowdown.  And nothing ever makes the network RPCS faster over time.
> :-)
> 
> We eventually got the system up to a slow-but-usable speed through the
> judicious use of gigantic LRU caches.  That's what you see today.
> Jon's project, however, is building a completely new implementation --
> one with a bigtable schema designed from scratch, designed to make as
> few Bigtable RPCS as possible.  I'm not sure it's safe for me to spill
> all the details of that schema to the public just yet;  I may need to
> get an official nod from someone first.

Ben,

Thanks for the info.  Seeing the schema you and Jon come up with would be very 
interesting.

Regards,
Blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2385447

Re: libsvn_fs_bigtable and general svn problems

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On Mon, Aug 17, 2009 at 5:24 PM, Blair Zajac<bl...@orcaware.com> wrote:
> Ben Collins-Sussman wrote:
>>
>> Hi folks,
>>
>> As you may have noticed, Google Code's Subversion service is still
>> pretty slow... still much slower than a stock Subversion/Apache server
>> running on a single box.  It turns out to be tricky to work with
>> bigtable:  you get massive scalability, but in return you have to
>> convert all of BDB's disk i/o calls into network RPCs.  On a single
>> box, the disk i/o calls get faster over time as the OS eventually ends
>> up swapping the underlying filesystem into memory.  But network RPCs
>> are slow and stay slow.  :-/
>
> Ben,
>
> Are there any writeups on the specifics of the svn_fs.h to BigTable mapping?
> How are the paths and node-ids mapped to BigTable's key and columns?

The original port that fitz and I did was fairly brain-dead:  we
simply forked libsvn_fs_base, and replaced BDB calls with Bigtable
calls.  Instead of BDB managing 10 hashtables on disk, we now had
Bigtable managing the same 10 "columns" in a single Bigtable.

It certainly worked, but it was heinously slow.  Our whole BDB backend
assumes that that reads & writes are essentially free.  Sure enough,
any reasonable OS will eventually page the BDB files directly into
memory and then access *is* essentially free.  However, by converting
these BDB calls to Bigtable network RPCs, we experienced a 10x
slowdown.  And nothing ever makes the network RPCS faster over time.
:-)

We eventually got the system up to a slow-but-usable speed through the
judicious use of gigantic LRU caches.  That's what you see today.
Jon's project, however, is building a completely new implementation --
one with a bigtable schema designed from scratch, designed to make as
few Bigtable RPCS as possible.  I'm not sure it's safe for me to spill
all the details of that schema to the public just yet;  I may need to
get an official nod from someone first.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2385200


Re: libsvn_fs_bigtable and general svn problems

Posted by Blair Zajac <bl...@orcaware.com>.
Ben Collins-Sussman wrote:
> Hi folks,
> 
> As you may have noticed, Google Code's Subversion service is still
> pretty slow... still much slower than a stock Subversion/Apache server
> running on a single box.  It turns out to be tricky to work with
> bigtable:  you get massive scalability, but in return you have to
> convert all of BDB's disk i/o calls into network RPCs.  On a single
> box, the disk i/o calls get faster over time as the OS eventually ends
> up swapping the underlying filesystem into memory.  But network RPCs
> are slow and stay slow.  :-/

Ben,

Are there any writeups on the specifics of the svn_fs.h to BigTable mapping? How 
are the paths and node-ids mapped to BigTable's key and columns?

I'm curious because I'm finishing a distributed versioned asset management 
system using svn as the backend database.  The system supports one global 
namespace that is distributed across facilities in (e.g. Los Angles and Bristol, 
England) and has to support writes everywhere even with network partitioning for 
the facility that owns a portion of the namespace.

Reading up on Cassandra, Hypertable and other distributed databases, I'm 
wondering that if I was starting this project now instead of two years ago 
whether I would have chosen Hadoop with Hypertable, respectively similar to GFS 
and BigTable, and provided a svn_fs.h like API on top of that that the asset 
management system would use.

Comments welcome.

Regards,
Blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2384554