You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Karl Fogel <kf...@newton.ch.collab.net> on 2002/08/15 16:09:51 UTC

Re: Import error and general questions

Your import error might be related to 

   http://subversion.tigris.org/issues/show_bug.cgi?id=860

Anyway, obviously yes, it's a bug :-).  If we can't identify it as an
already-known bug from your description, then we'll probably ask you
for a minimal data set that reproduces the problem (so don't throw
those files out just yet).

But I suspect someone can recognize this as a manifestation of a known
scalability problem; does it ring a bell, anyone?  (In fact, I feel
like I ought to recognize it myself, but I don't).

Shay Harding <sh...@ccbill.com> writes:
> Second question relates to the 'db' directory within the repository.
> What is the 'strings' file used for? After importing only 1 of many
> directory trees, it sits at 62M. Will it matter how big this file gets?
> I have roughly 15 projects to import, although not all of them are
> nearly as large as 65M.

The `strings' table is where all the file contents are stored, so yes,
it will get about as big as the tree you import (the other tables hold
metadata such as directory entries).

So it just gets larger the more data you import.  In the
multi-100s-of-megabytes range you're talking about, it shouldn't be a
problem I think.

But hmmm.  Don't some systems start having problems when a single file
is in the multi-*gigabyte* range?  Or is that only when trying to
access it through stdio?  I presume Berkeley DB already has
optimizations for large files, but will check...

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Ben Collins-Sussman <su...@collab.net>.
Bernd Walter <ti...@cicely5.cicely.de> writes:

> The commit failed after a long time because I wedged a hook script.
> The string db was left at 300M in size.

I would expect that;  you had a 300M transaction tree that never got
converted into a revision.  'svnadmin lstxns' and 'svnadmin rmtxn'
could be used to shrink your db back down.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Bernd Walter <ti...@cicely5.cicely.de>.
On Thu, Aug 15, 2002 at 12:19:41PM -0500, Karl Fogel wrote:
> Shay Harding <sh...@ccbill.com> writes:
> > 105MB of mixed data (images, docs, text, etc)
> > 6205 total files
> > 
> > Started off ok with 305MB of memory being used before the "Transmitting
> > data..." text showed up. Then it dropped to around 200MB, so 100MB was
> > freed at that point.
> >
> > About halfway through all the little dots, memory usage jumped back to
> > around 340MB and steadily grew from there. Using top and ps, it was not
> > readily apparent what was using so much memory as no httpd was above
> > 10MB and svn was around 50MB (60MB at its peak usage).
> > 
> > When the dots stopped all of a sudden an httpd process started eating
> > all available memory to include 1G of swap space:
> 
> Hmmm.  It sounds like the same bug I mentioned, yeah.  One question
> is: what sizes are the largest files you're importing?

I once tried to commit (not import) the complete FreeBSD source.
The source is in the 300M range and has several thousand of files.
I had some httpd process of around 100M and an svn process of around
300M in size.
The commit failed after a long time because I wedged a hook script.
The string db was left at 300M in size.
I thought the records where cleared but and just the file not shrinked,
but after the next try I had a strings db of around 600M and the commit
failed with a timeout on MERGE.

-- 
B.Walter              COSMO-Project         http://www.cosmo-project.de
ticso@cicely.de         Usergroup           info@cosmo-project.de


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Shay Harding <sh...@ccbill.com> writes:
> > Hmmm.  It sounds like the same bug I mentioned, yeah.  One question
> > is: what sizes are the largest files you're importing?
> 
> Biggest file is about 4.5MB

That's interesting, and somewhat unexpected.  Looks like we've
probably regressed in server-side memory performance.  

Can you annotate issue #860, giving your numbers, and pointing out how
they indicate that the problem might be with the total number of bytes
coming through, not necessarily with the size of any individual file?

Thanks (& just ping me if no time, so I can fix up the issue),
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Shay Harding <sh...@ccbill.com>.
On Thu, 2002-08-15 at 10:19, Karl Fogel wrote:
> Shay Harding <sh...@ccbill.com> writes:
> > 105MB of mixed data (images, docs, text, etc)
> > 6205 total files
> > 
> > Started off ok with 305MB of memory being used before the "Transmitting
> > data..." text showed up. Then it dropped to around 200MB, so 100MB was
> > freed at that point.
> >
> > About halfway through all the little dots, memory usage jumped back to
> > around 340MB and steadily grew from there. Using top and ps, it was not
> > readily apparent what was using so much memory as no httpd was above
> > 10MB and svn was around 50MB (60MB at its peak usage).
> > 
> > When the dots stopped all of a sudden an httpd process started eating
> > all available memory to include 1G of swap space:
> 
> Hmmm.  It sounds like the same bug I mentioned, yeah.  One question
> is: what sizes are the largest files you're importing?

Biggest file is about 4.5MB


Shay




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Shay Harding <sh...@ccbill.com> writes:
> 105MB of mixed data (images, docs, text, etc)
> 6205 total files
> 
> Started off ok with 305MB of memory being used before the "Transmitting
> data..." text showed up. Then it dropped to around 200MB, so 100MB was
> freed at that point.
>
> About halfway through all the little dots, memory usage jumped back to
> around 340MB and steadily grew from there. Using top and ps, it was not
> readily apparent what was using so much memory as no httpd was above
> 10MB and svn was around 50MB (60MB at its peak usage).
> 
> When the dots stopped all of a sudden an httpd process started eating
> all available memory to include 1G of swap space:

Hmmm.  It sounds like the same bug I mentioned, yeah.  One question
is: what sizes are the largest files you're importing?

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

APR largefile support was Re: Import error and general questions

Posted by Justin Erenkrantz <je...@apache.org>.
On Thu, Aug 15, 2002 at 05:28:12PM -0500, Karl Fogel wrote:
> "Glenn A. Thompson" <gt...@cdr.net> writes:
> > So Subversion will be able to handle 54+ GB of data?
> > If so, does BDB handle it across multiple files say for the "Strings" table.

FWIW, a patch was recently submitted to dev@apr to get APR to
support largefiles too.

So, that patch should be integrated too, but I don't really have
any machines to test it on, so I can't commit it.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Glenn A. Thompson" <gt...@cdr.net> writes:
> So Subversion will be able to handle 54+ GB of data?
> If so, does BDB handle it across multiple files say for the "Strings" table.

Frankly, I don't know yet.  The answer to that may be
system-dependent, like the effect of Berkeley's `--disable-largefile'
flag is system-dependent:

http://www.sleepycat.com/docs/ref/build_unix/conf.html#--disable-largefile
says:

    --disable-largefile:

    Some systems, notably versions of HP/UX and Solaris, require
    special compile-time options in order to create files larger than
    2^32 bytes. These options are automatically enabled when Berkeley
    DB is compiled. For this reason, binaries built on current
    versions of these systems may not run on earlier versions of the
    system because the library and system calls necessary for large
    files are not available. To disable building with these
    compile-time options, enter --disable-largefile as an argument to
    configure.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by "Glenn A. Thompson" <gt...@cdr.net>.

Karl Fogel wrote:

> Brian Behlendorf <br...@collab.net> writes:
> > > Anyway filesize is worth keeping an eye on.
> >
> > Yep.  We have one customer we hope to move to Subversion soon who has 54+
> > GB of ,v files in their CVS repo.  This customer also likes checking in
> > large binary files, so there's a chance the binary deltas will compress it
> > a bit, but clearly that's too much for a single big db file.
>
> Just so people know: this issue (#860) is scheduled to be fixed by the
> 0.14.3 tarball, which is for September 5th.
>
> And hey, we've fixed it before, so how hard can it be? :-)
>
> -K

So Subversion will be able to handle 54+ GB of data?
If so, does BDB handle it across multiple files say for the "Strings" table.
gat


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Brian Behlendorf <br...@collab.net> writes:
> > Anyway filesize is worth keeping an eye on.
> 
> Yep.  We have one customer we hope to move to Subversion soon who has 54+
> GB of ,v files in their CVS repo.  This customer also likes checking in
> large binary files, so there's a chance the binary deltas will compress it
> a bit, but clearly that's too much for a single big db file.

Just so people know: this issue (#860) is scheduled to be fixed by the
0.14.3 tarball, which is for September 5th.

And hey, we've fixed it before, so how hard can it be? :-)

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Brian Behlendorf <br...@collab.net>.
On Thu, 15 Aug 2002, Glenn A. Thompson wrote:
> Anyway filesize is worth keeping an eye on.

Yep.  We have one customer we hope to move to Subversion soon who has 54+
GB of ,v files in their CVS repo.  This customer also likes checking in
large binary files, so there's a chance the binary deltas will compress it
a bit, but clearly that's too much for a single big db file.

	Brian




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
Hey:

>
>
> File size probably shouldn't be a problem. Currently running under Linux
> which supports 4G (maybe more?) files. I'm pretty sure BerkeleyDB will
> handle files to that size since MySQL uses it on the back-end for it's
> InnoDB tables (I *think*).

No. MySQL has two (actually three, one appears to have died on the vine.) table
handlers that are transaction safe:
BDB and InnoDB.
The BDB implementation was developed by the MySQL folks I think.
InnoDB is a separate thing developed by a separate company.
http://www.innoDB.com/ They have become more closely integrated with MySQL as
of release 4.0.x however.  InnoDB is tablespace based. Multiple tables per
space and multiple files per space.  They are not as slick as some commercial
DB tablespaces, but they are getting better with every release.

> At any rate I know we have DB files in excess
> of 1G so I'm not too concerned about that.

As I said InnoDB tables support using multiple files per tablespace. Or what
ever they call them in MySQL. I forget.
I think InnoDB also supports auto extend in some form in release 4.0.2.
Anyway, large files can cause other interesting complications like: How well
does your backup system handle them, if at all.?
Anyway filesize is worth keeping an eye on.

gat




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Import error and general questions

Posted by Shay Harding <sh...@ccbill.com>.
On Thu, 2002-08-15 at 09:09, Karl Fogel wrote:
> Your import error might be related to 
> 
>    http://subversion.tigris.org/issues/show_bug.cgi?id=860
> 
> Anyway, obviously yes, it's a bug :-).  If we can't identify it as an
> already-known bug from your description, then we'll probably ask you
> for a minimal data set that reproduces the problem (so don't throw
> those files out just yet).
> 
> But I suspect someone can recognize this as a manifestation of a known
> scalability problem; does it ring a bell, anyone?  (In fact, I feel
> like I ought to recognize it myself, but I don't).

After you mentioned this, I went and cleared out the repository to start
fresh. Rebooted the server so all memory was cleared. I then tried to
import the full directory tree:

105MB of mixed data (images, docs, text, etc)
6205 total files

Started off ok with 305MB of memory being used before the "Transmitting
data..." text showed up. Then it dropped to around 200MB, so 100MB was
freed at that point.

About halfway through all the little dots, memory usage jumped back to
around 340MB and steadily grew from there. Using top and ps, it was not
readily apparent what was using so much memory as no httpd was above
10MB and svn was around 50MB (60MB at its peak usage).

When the dots stopped all of a sudden an httpd process started eating
all available memory to include 1G of swap space:

I removed a lot of the ps output so it would fit without wrapping:


VSZ      RSS
949056 488624 ?     D    09:39   1:24 /var/securewww/bin/httpd
59656 1104 pts/1    S    09:46   0:47 svn import

996160 489664 ?     R    09:39   1:26 /var/securewww/bin/httpd
59656 1116 pts/1    S    09:46   0:47 svn import

1034048 490008 ?    R    09:39   1:27 /var/securewww/bin/httpd
59656 1120 pts/1    S    09:46   0:47 svn import

1115968 485204 ?    R    09:39   1:29 /var/securewww/bin/httpd
59656 1124 pts/1    S    09:46   0:47 svn import

1230656 493436 ?    R    09:39   1:32 /var/securewww/bin/httpd 
59656 1124 pts/1    S    09:46   0:47 svn import

1278784 494124 ?    D    09:39   1:33 /var/securewww/bin/httpd
59656 1128 pts/1    S    09:46   0:47 svn import

1417024 480900 ?    D    09:39   1:37 /var/securewww/bin/httpd
59656 1156 pts/1    S    09:46   0:47 svn import

1447744 493896 ?    R    09:39   1:38 /var/securewww/bin/httpd
59656 1148 pts/1    S    09:46   0:47 svn import

0    0 ?        Z    09:39   1:40 [httpd <defunct>]   #Memory exhausted
59656 11168 pts/1   D    09:46   0:47 svn import


I don't understand what is going on behind the scenes that would cause
the import of 105MB of files to use so much memory.



> Shay Harding <sh...@ccbill.com> writes:
> > Second question relates to the 'db' directory within the repository.
> > What is the 'strings' file used for? After importing only 1 of many
> > directory trees, it sits at 62M. Will it matter how big this file gets?
> > I have roughly 15 projects to import, although not all of them are
> > nearly as large as 65M.
> 
> The `strings' table is where all the file contents are stored, so yes,
> it will get about as big as the tree you import (the other tables hold
> metadata such as directory entries).
> 
> So it just gets larger the more data you import.  In the
> multi-100s-of-megabytes range you're talking about, it shouldn't be a
> problem I think.
> 
> But hmmm.  Don't some systems start having problems when a single file
> is in the multi-*gigabyte* range?  Or is that only when trying to
> access it through stdio?  I presume Berkeley DB already has
> optimizations for large files, but will check...
> 
> -K

File size probably shouldn't be a problem. Currently running under Linux
which supports 4G (maybe more?) files. I'm pretty sure BerkeleyDB will
handle files to that size since MySQL uses it on the back-end for it's
InnoDB tables (I *think*). At any rate I know we have DB files in excess
of 1G so I'm not too concerned about that.



Shay






---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org