You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by John Coiner <jo...@amd.com> on 2009/02/21 17:23:12 UTC
SVN scalability problem as number of tags grows
Hi SVN developers,
I support SVN for a few hundred co-workers. We have been using SVN
heavily for about two years, generating about 60000 commits, 90000 tags,
and 3000 branches in one repository.
We have recently discovered a scalability problem. If you follow the
usual "trunk/tags/branches" structure, the size required to store each
new tag grows in proportion to the number of tags previously created.
This can be demonstrated in just a few commands, in a brand new repository:
183 svnadmin create test_repo
184 svn list file:///home/john/testsvn/test_repo
185 svn mkdir file:///home/john/testsvn/test_repo/trunk -m ''
186 svn mkdir file:///home/john/testsvn/test_repo/tags -m ''
187 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag1 -m ''
188 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag2 -m ''
189 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag3 -m ''
190 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag4 -m ''
191 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag5 -m ''
192 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag6 -m ''
193 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag7 -m ''
194 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag8 -m ''
195 svn copy file:///home/john/testsvn/test_repo/trunk
file:///home/john/testsvn/test_repo/tags/tag9 -m ''
In the FSFS, each new revs/ entry is larger than the previous one. In
the output of 'ls' below, revs 3 through 11 correspond to the creation
of the tag1 through tag9 directories:
john@pitfall:~/testsvn/test_repo/db/revs/0$ ls -latr
total 56
-rw-r--r-- 1 john john 115 2009-02-21 11:19 0
drwxr-sr-x 3 john john 4096 2009-02-21 11:19 ..
-rw-r--r-- 1 john john 277 2009-02-21 11:19 1
-rw-r--r-- 1 john john 305 2009-02-21 11:19 2
-rw-r--r-- 1 john john 531 2009-02-21 11:19 3
-rw-r--r-- 1 john john 564 2009-02-21 11:19 4
-rw-r--r-- 1 john john 595 2009-02-21 11:19 5
-rw-r--r-- 1 john john 628 2009-02-21 11:19 6
-rw-r--r-- 1 john john 659 2009-02-21 11:19 7
-rw-r--r-- 1 john john 690 2009-02-21 11:20 8
-rw-r--r-- 1 john john 721 2009-02-21 11:20 9
-rw-r--r-- 1 john john 762 2009-02-21 11:20 10
-rw-r--r-- 1 john john 800 2009-02-21 11:20 11
drwxr-sr-x 2 john john 4096 2009-02-21 11:20 .
After creating 90000 tags, each new tag consumes megabytes of space in
the repository. Also each new tag takes a few seconds to apply, up from
milliseconds when we first began. We had the expectation of more
graceful scaling, based in part on our experience in other situations
where SVN scales well, for example committing a million additions to the
same file.
Our big installation is running on Linux, SVN 1.4.4, and FSFS. The
problem also exists in SVN 1.5.1.
Is this a known issue? Are there plans to make this more scalable? I
searched the issues database and did not find anything that looked like
a duplicate. Should I file a new issue?
Do you have any recommendations for a work around?
One workaround that we are evaluating is to shard the branches and tags
over a large number of directories. So rather than create
"tags/TAG_NAME", we may begin to create "tags2/1/b/5/e/TAG_NAME". The
"1/b/5/e" is the first four hex digits of the md5 hash of "TAG_NAME". We
chose "tags2" as the base directory to avoid colliding with existing
entries under "tags/" that happen to be named after a hex digit.
This scales better. Applying N sharded tags requires O(N) space and each
tag takes O(1) time to apply.
One possible resolution of this issue is a documentation-only change. If
the SVN book described the scalability issue and recommended a sharded
tags and branches structure, it would help future "enterprise" adopters
(and other crazy people who create way too many tags :)
Please let me know if you need any more information about this problem.
Cheers,
John
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204071
Re: SVN scalability problem as number of tags grows
Posted by Greg Hudson <gh...@mit.edu>.
On Sat, 2009-02-21 at 12:23 -0500, John Coiner wrote:
> Is this a known issue? Are there plans to make this more scalable? I
> searched the issues database and did not find anything that looked like
> a duplicate. Should I file a new issue?
It is a known issue that svn's back end storage of directories with many
entries isn't terribly efficient. All revisions of all directory lists
are stored in full, so a directory with many entries takes O(n) time to
modify and O(n) space to hold each new revision (O(n^2) space total, if
the number of changes is proportional to the number of entries).
Since we use directories to hold tags, this issue applies to large
numbers of tags if they are stored in a single flat directory, as the
usual convention suggests.
I don't know of any plans to make this more scalable. It would require
a significant rearchitecting of directory storage. One approach would
be to use a balanced tree with many roots to hold all revisions of a
directory--but to do that, we'd have to store all revisions of a
directory together (not necessarily in the same disk blocks, but in some
fashion designed to avoid excessive seeking). In FSFS, because of other
design contraints, that's simply not practical. In BDB it might be more
tractable.
> Do you have any recommendations for a work around?
Organizing the tags in a tree structure is probably the best workaround,
as you have already found.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204282
RE: Re: SVN scalability problem as number of tags grows
Posted by Andy Bolstridge <an...@bolstridge.plus.com>.
> Another possibility is to use rev #s rather than tags. If we were
> starting over we might do this. (Given our infrastructure already
> deployed atop SVN, which already uses tags, switching to rev #s is a
> riskier change than switching to sharded tags.)
>
Someone once suggested storing a 'label' text that mapped to a revnum, so you could have human-readable 'tags' without having to create the tag branches. IIRC he got shot down in flames, but I think the suggestion was a good one - especially if you create many tag branches, and they are not quite as cheap as described.
sure, adding new entries = more data, but if you add more data and branch as well, ad then make lots of tags, you're going to see a significant increase in storage sooner rather than later.
I havn't seen a problem with it yet (and I have 12Gig and 300,000 revisions) but this does act as a warning not to start creating tag branches, thanks.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2313467
Re: SVN scalability problem as number of tags grows
Posted by John Coiner <jo...@amd.com>.
Greg Stein wrote:
> It really doesn't have anything to do with tags/ per se, but simply
> that you're creating an ever-larger directory. The size of the
> name:node mapping for the directory contents will continue to grow as
> you add new entries into that directory.
Yes, agreed.
> The sharding is the appropriate solution. You could shard by date,
> initial letters of the tag, or the hash of the tag (as you suggested).
> Just settle on one, and you should be fine.
Thank you, it's nice to have a vote of confidence.
It would be nice if the svn book had a warning about this. It would be
extra nice if the svn book had a section on the scalability of several
common operations.
One of my coworkers has tested SVN scalability in a number of
situations. So the data exists. I'll get in touch with the svn book
project and see if they would like a contribution.
> Another solution would be to delete obsolete tags... Lots of possibilities.
Agreed. It's difficult for us to know which tags are obsolete, which is
our own problem.
Another possibility is to use rev #s rather than tags. If we were
starting over we might do this. (Given our infrastructure already
deployed atop SVN, which already uses tags, switching to rev #s is a
riskier change than switching to sharded tags.)
Thanks for your help with this. Cheers,
John
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204451
Re: SVN scalability problem as number of tags grows
Posted by Greg Stein <gs...@gmail.com>.
Hi John,
It really doesn't have anything to do with tags/ per se, but simply
that you're creating an ever-larger directory. The size of the
name:node mapping for the directory contents will continue to grow as
you add new entries into that directory.
You'd see the exact same problem if you created 90000 entries in
/trunk/some/path/down/deep/in/the/hierarchy/.
The sharding is the appropriate solution. You could shard by date,
initial letters of the tag, or the hash of the tag (as you suggested).
Just settle on one, and you should be fine.
Another solution would be to delete obsolete tags. Note that they will
always be there in history, just not in HEAD. You could also rotate
tags into an archival tag directory. For example, each month, you
could move tags into /tags/archive/2008-12/ and
/tags/archive/2009-01/. Or even /archived-tags/... for that matter.
Lots of possibilities. I think the right answer is going to depend
upon your workflow, to determine what will work best for you. Main
point: creating directories with 90k entries *is* going to consume
more time and space.
Cheers,
-g
On Sat, Feb 21, 2009 at 18:23, John Coiner <jo...@amd.com> wrote:
> Hi SVN developers,
>
> I support SVN for a few hundred co-workers. We have been using SVN
> heavily for about two years, generating about 60000 commits, 90000 tags,
> and 3000 branches in one repository.
>
> We have recently discovered a scalability problem. If you follow the
> usual "trunk/tags/branches" structure, the size required to store each
> new tag grows in proportion to the number of tags previously created.
>
> This can be demonstrated in just a few commands, in a brand new repository:
>
> 183 svnadmin create test_repo
> 184 svn list file:///home/john/testsvn/test_repo
> 185 svn mkdir file:///home/john/testsvn/test_repo/trunk -m ''
> 186 svn mkdir file:///home/john/testsvn/test_repo/tags -m ''
> 187 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag1 -m ''
> 188 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag2 -m ''
> 189 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag3 -m ''
> 190 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag4 -m ''
> 191 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag5 -m ''
> 192 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag6 -m ''
> 193 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag7 -m ''
> 194 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag8 -m ''
> 195 svn copy file:///home/john/testsvn/test_repo/trunk
> file:///home/john/testsvn/test_repo/tags/tag9 -m ''
>
> In the FSFS, each new revs/ entry is larger than the previous one. In
> the output of 'ls' below, revs 3 through 11 correspond to the creation
> of the tag1 through tag9 directories:
>
> john@pitfall:~/testsvn/test_repo/db/revs/0$ ls -latr
> total 56
> -rw-r--r-- 1 john john 115 2009-02-21 11:19 0
> drwxr-sr-x 3 john john 4096 2009-02-21 11:19 ..
> -rw-r--r-- 1 john john 277 2009-02-21 11:19 1
> -rw-r--r-- 1 john john 305 2009-02-21 11:19 2
> -rw-r--r-- 1 john john 531 2009-02-21 11:19 3
> -rw-r--r-- 1 john john 564 2009-02-21 11:19 4
> -rw-r--r-- 1 john john 595 2009-02-21 11:19 5
> -rw-r--r-- 1 john john 628 2009-02-21 11:19 6
> -rw-r--r-- 1 john john 659 2009-02-21 11:19 7
> -rw-r--r-- 1 john john 690 2009-02-21 11:20 8
> -rw-r--r-- 1 john john 721 2009-02-21 11:20 9
> -rw-r--r-- 1 john john 762 2009-02-21 11:20 10
> -rw-r--r-- 1 john john 800 2009-02-21 11:20 11
> drwxr-sr-x 2 john john 4096 2009-02-21 11:20 .
>
> After creating 90000 tags, each new tag consumes megabytes of space in
> the repository. Also each new tag takes a few seconds to apply, up from
> milliseconds when we first began. We had the expectation of more
> graceful scaling, based in part on our experience in other situations
> where SVN scales well, for example committing a million additions to the
> same file.
>
> Our big installation is running on Linux, SVN 1.4.4, and FSFS. The
> problem also exists in SVN 1.5.1.
>
> Is this a known issue? Are there plans to make this more scalable? I
> searched the issues database and did not find anything that looked like
> a duplicate. Should I file a new issue?
>
> Do you have any recommendations for a work around?
>
> One workaround that we are evaluating is to shard the branches and tags
> over a large number of directories. So rather than create
> "tags/TAG_NAME", we may begin to create "tags2/1/b/5/e/TAG_NAME". The
> "1/b/5/e" is the first four hex digits of the md5 hash of "TAG_NAME". We
> chose "tags2" as the base directory to avoid colliding with existing
> entries under "tags/" that happen to be named after a hex digit.
>
> This scales better. Applying N sharded tags requires O(N) space and each
> tag takes O(1) time to apply.
>
> One possible resolution of this issue is a documentation-only change. If
> the SVN book described the scalability issue and recommended a sharded
> tags and branches structure, it would help future "enterprise" adopters
> (and other crazy people who create way too many tags :)
>
> Please let me know if you need any more information about this problem.
> Cheers,
>
> John
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204071
>
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1204276