You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Tom Lord <lo...@emf.net> on 2003/04/05 17:46:46 UTC

BDB log file factoids (prbly part of why they are so big)

I got slightly less lazy and did some homework.  I found this:

       http://www.sleepycat.com/docs/ref/refs/bdb_usenix.html

BDB logs are symmetric -- meaning each log entry contains the state of
a DB record "before and after" the logged event.

The paper isn't explicit about whether log entries are k/v pairs or
pages (or lines).  I would _guess_ (and half remember) that they are
pages (for hash tables and btrees -- probably lines for recno dbs)
just because that would be far easier to get right.

A svn client request gets translated into changes to a bunch of DB
pages, updating both the application data and BDB's internal data
structures.  Leaving aside whether it is only the application changes
or all changed pages that are logged, we can focus on just the
application data:

The amount of application data that can change will (if I understand
svn right) sometimes be quite substantial (as when a file is
redeltified).  In a case such as that, committing a small delta to the
database can expand the log with complete before and after copies of
the entire history of the redeltafied file.  (Aside from the space
implications, that suggests that some care has to be taken if the
commit rate of small changes is high -- e.g., in a wiki.)  [And a
question: is the deltafied history of a file a single BDB datum, or is
it broken into records at commit boundaries?  If the former, then even
absent redeltafication, any change to a file logs the complete
history, before and after.]  Additionally each commit will add to the
log before and after copies of any modified directories.

So, just based on the before/after snapshots of client data -- I think
it's not very surprising svn client requests generate comparatively
large logs.

The paper's authors describe three conditions from which BDB is
designed to recover (if only one of the three occurs): loss of system
memory (e.g., application or system crash), loss of log file or loss
of database (e.g., disk failure).   

The benefits of recovering from a disk failure are likely to be only
partially realized by svn users: most, I suspect, will store both sets
of data on the same disk; some will, it seems, need to discard log
files more frequently than they backup the database; a corrupt log
file is not likely to be noticed until it is needed; I doubt many
users will be attracted to the possibility of spinning off log files
to tertiary storage, or running a daemon to validate them.  The
primary (but not only) use for log files in svn is, therefore,
recovery from system and application crashes.

Application-level logging, essentially just journaling the incoming
client requests, offers some interesting potential benefits.

As noted above, BDB logs appear likely to be large compared to the
size of the request stream.  Simply reasoning about the nature of the
requests suggests that, except in the case where changes to
unversioned properties constitute a large percentage (size-wise) of
client requests, the size of a request journal and the size of the
database itself will be in the same ballpark -- they will grow at
comperable rates.  (Beyond that, the journal can be compressed).

The smaller size of a journal will relax the pressure to spin logs of
to tertiary storage and reduce the cost of backups.  Using either
technique (log or journals), it would be prudent to store the log (or
journal) on a different device from the main database.  Because a
journal will likely be much smaller, it is more practical to find
space for it on another device.

The two disk failure scenarios (loss of log, loss of data file) have
different recovery procedures.  The recovery from a detected loss of a
log is simply to back up the latest data file.   The recovery from a
loss of a data file is to start with a back up and play back the log
forward from the date of that backup.   Write-ahead application-level
journaling, as advocated here, is sufficient for recovery from a loss
of data file using the play-back technique.

The BDB paper does not consider a scenario that I think should not be
ignored for a product at svn's phase in its life cycle (in spite of
its excellent history in this regard): recovery from application-level
bugs.  If it has been a week since I backed up my repository, but only
half a day since a svn bug corrupted its database, then playing the
journal forward from the last backup to one half day ago provides a
nice recovery.

Unlike a BDB log, an app level journal and its playback mechanism
provide a database-independent recovery technique for loss of database
data on disk.  Recovery code and administration tools will be
unchanged even if BDB is replaced by an RDBMS or other storage
manager.

Finally, an app level journal presents some obvious benefits for
diagnostic purposes, both concerning the operation of svn, and
concerning the actions taken by clients.  (On the latter point, it
provides an ideal record for finding the exact point at which a
malicious client began operation, and then seeing exactly what that
client did.)

So, to sum it all up:

BDB logs for a svn repository are likely to grow at a much faster rate
than the data file.   This makes them more expensive to administer in
a way that gives the benefits of recovery from data file loss and
recovery from app-level data file corruption.   Examples of this in
action can be seen from reports from users reporting failures due to
filled disks and surprises due to disk space consumption.

An app-level journal is likely to grow at a rate much closer to the
rate at which the data file itself grows.   If you can afford a file
the size of your database, you can just about as easilly afford a
journal file.   Use of an app-level journal may make it more likely
that more users will experience the recoverability benefits of
write-ahead logging.

App-level journaling also has advantages for: database-independent
recovery and administration tools, and diagnostic examination of the
history of a svn repository.

And just to redundantly clear up one point that caused some confusion:
I am not suggesting that if app-level journals are added, BDB logs
would be disabled.  Instead, I'm saying that if app-level journals are
added, BDB logs can be automatically pruned, so that their size never
need exceed the amount by which a log grows between (outside of txns)
flushes of data file data to stable storage.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: BDB log file factoids (prbly part of why they are so big)

Posted by Tom Lord <lo...@emf.net>.

       But you're right in your subsequent message that we're probably
       writing out the file contents twice to the logfile and once to
       the db, which is pretty wasteful in disk bandwidth and
       short-term space use.

Incidentally, it'd be interesting for someone to instrument an active
repository to measure:

	(a) sizes of incoming write txn traffic

	(b) corresponding growth in BDB data file size

        (c) corresponding growth in BDB log file size

If you care about the recoverability enabled by log files (and some
users won't, of course), then that can give you an empirical formula 
for relating disk space requirements to back-up schedules.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: BDB log file factoids (prbly part of why they are so big)

Posted by Tom Lord <lo...@emf.net>.

       But you're right in your subsequent message that we're probably
       writing out the file contents twice to the logfile and once to
       the db, which is pretty wasteful in disk bandwidth and
       short-term space use.


Yup.  That's the big thing and thanks for the sanity check -- as I
say, sorry I'm scrambling to catch up on implementation details here.

It's maybe worth repeating that for maximally robust operation,
"short-term" here means "between backups" and yadda yadda.

I'm really torn between two paths to fixing that, if one were inclined
to do so:

On the one hand, app-level journaling has all the aforementioned
benefits and is probably desirable anyway for diagnostic purposes.

On the other hand, the deeper, harder to swallow solution of changing
the data model a bit seems like the clear long-term winner to me.
To expand on that:

You use "cheap tree cloning" to do branching and tagging.  I think it
has a third use: revisioning.  There's no need at all to version the
entire fs -- just clone the archived sub-tree you're committing to
create a new revision and encode the new revision's name in the fs
namespace.  Of course, that means introducing an explicit notion of
"project" to the abstraction, whereas currently "project" is entirely
a user policy notion carved out of the fs namespace.  Looking beyond
the BDB implementation, this approach can really cut down on i/o
bandwidth and space use, not to mention improving the overall rev ctl
model.  (But, yeah, the big problem with any such suggestion is the
pressure to get 1.0 out.)

So, out in "if wishes and buts were candy and nuts"-land, I guess the
Right Thing would be to take the fork in the road: add app-level
journaling AND look for an affordable way to hide (in the UI) the
revid and impose a little more structure on the fs early -- then look
into fancier BDB alternatives later.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: BDB log file factoids (prbly part of why they are so big)

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2003-04-05 at 12:46, Tom Lord wrote:
> The amount of application data that can change will (if I understand
> svn right) sometimes be quite substantial (as when a file is
> redeltified).  In a case such as that, committing a small delta to the
> database can expand the log with complete before and after copies of
> the entire history of the redeltafied file.

If there are N previous revisions of a file, then at most lg(N) deltas
are rewritten by the redeltification process (and the average number of
rewritten deltas is a constant less than 2).  But you're right in your
subsequent message that we're probably writing out the file contents
twice to the logfile and once to the db, which is pretty wasteful in
disk bandwidth and short-term space use.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: BDB log file factoids (prbly part of why they are so big)

Posted by Tom Lord <lo...@emf.net>.

	Database logs don't provide serializing.  

I didn't say that they do.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: BDB log file factoids (prbly part of why they are so big)

Posted by ryan <ry...@netidea.com>.
> I'll go out on a speculative limb a bit here, and beyond the topic of
> log file mgt: it seems to me (on the basis of the "structure"
> document) that you pay a steep cost in database updates and log space
> to provide the "global-to-repository revision number" which
> effectively serializes all write txns to svn databases.  Furthermore,
> I see little actual user benefit to that serialization: writes to
> unrelated projects or branches within a repository need not be

Hi,

Database logs don't provide serializing.  They provide the 'D' in ACID
compliance.  It's possible to get serialization without logs, using OS level
semaphores for example.

Now, there is a question of if db4 doesn't recycle the logs often enough.
This really isn't a subversion question.

One possibility is if you consider each subversion repository a 'project'
then you could place a structure on top of that such as you're suggesting.
Thus allowing branching and merging between svn repos across the network.
ie:
Svn copy http://url1 http://url2

To branch from one repository to another, or perhaps to a local one:
Svn copy http://netrepository file:///home/me/mybranch

Note then that in this case the revision numbers are specific to a
project/branch as you suggest.

I really think the log file 'issue' is a red herring.

Regards,
-ryan 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: BDB log file factoids (prbly part of why they are so big)

Posted by Tom Lord <lo...@emf.net>.

Ok, sorry to be playing catch-up here:

       [And a question: is the deltafied history of a file a single
       BDB datum, or is it broken into records at commit boundaries?
       If the former, then even absent redeltafication, any change to
       a file logs the complete history, before and after.]

Looking at some hopefully-still-sufficiently-up-to-date documentation
(subversion/libsvn_fs/structure in a slightly out-of-date source
tree), and hopefully not misunderstanding it too badly: the latest
revision of a file on each branch is always kept full text, right?

So the implication is that upon a commit to some file, the previous
head revision is destructively replaced by a delta, the new revision
stored full-text?

So that means that to commit a change to a file, the BDB log file
should grow by at least:

	2 * full_text_size + delta_size

for that file.  (Where `full_text_size' is the average of the full
text sizes of the old and new head revisions.)


To accomplish the same effect, an app-level journal would grow by:

	delta_size

which, uncoincidentally enough, is the same approximate amount by
which the db data file grows for that part of the operation.

The BDB log and the database data file will also grow to reflect
updates to the containing directory and its ancestors (again, if I'm
reading this right) -- none of that would appear in an app-level
journal.

I'll go out on a speculative limb a bit here, and beyond the topic of
log file mgt: it seems to me (on the basis of the "structure"
document) that you pay a steep cost in database updates and log space
to provide the "global-to-repository revision number" which
effectively serializes all write txns to svn databases.  Furthermore,
I see little actual user benefit to that serialization: writes to
unrelated projects or branches within a repository need not be
ordered.  None of this would really matter for 1.0, except that revids
appear in the user interface, are important to merging, and are likely
to be incorporated into user scripts and usage habits.  Oddly enough,
with the sort of archish-structure I've talked about layering over
svn, I think it quite plausible to surface a slightly different UI in
which revids are truly hidden -- thus opening up a new degree of
freedom for the implementation.  Recall that in my proposal, the role
currently played by the revid is instead played by
(project/branch-specific) names in the fs namespace.


-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org