You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by "Eric S. Raymond" <es...@thyrsus.com> on 2001/01/05 03:54:38 UTC

Re: I'll help with db debugging

Chip Salzenberg <ch...@valinux.com>:
> I'll be glad to help with debugging the db crash.  I've been wanting
> to get into subversion for a while now, and there's no place better
> for a newbie to start than with the ugly grotty stuff that no one
> wants to do....

Larry McVoy told me this weekend at the Kernel Hackers' Summit that
Subversion is using db, a non-transparent binary format, to store critical
state.  He also said "There was cheering and shouting in the halls at 
Bitkeeper when we heard that news.  Those <deleted>s just added a year
to their development time."

Larry is right.  This choice was a major, major blunder that fills me
with unease about the future of this project.  What little you gain in
performance you will lose in multiplied difficulties and schedule slips
because corruption will be so much harder to detect and recover from

Please reverse this bad choice *now*, before you get nibbled to death and
bogged down in db-related problems.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

According to the National Crime Survey administered by the Bureau of
the Census and the National Institute of Justice, it was found that
only 12 percent of those who use a gun to resist assault are injured,
as are 17 percent of those who use a gun to resist robbery. These
percentages are 27 and 25 percent, respectively, if they passively
comply with the felon's demands. Three times as many were injured if
they used other means of resistance.
        -- G. Kleck, "Policy Lessons from Recent Gun Control Research,"

Re: I'll help with db debugging

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.

On Tuesday 03 April 2001 00:42, you wrote:
> Arved Sandstrom <Ar...@chebucto.ns.ca>:
> > Couldn't help throwing in my 1.2 cents Canadian on this one,
> > though. To start with, exactly what is the magical (transparent
> > non-binary) format that will make it _easy_ to detect corruption and
> > recover from same? I can throw out a few guesses, but I won't.
>
> I don't know what the format should be.  If I were designing it, it
> would probably be some sort of XML with auxiliary binary
> nane-to-offset indices that get regenerated on the fly whenever the
> text data part is newer than the index.  That way you get both the
> speed advantages of a binary format and the transparancy advantages of
> text.

Interesting. In effect, this is a very PDF'ish idea. I happen to like that 
latter format for exactly the reasons you bring out.

> > Number two, a team that's prone to writing code that garbages up a DB is
> > going to be prone to writing code that garbages up a text (non-binary)
> > format.
>
> True, but not the point.  The point is that it's a lot easier for human
> eyeballs to grok patterns in text than in binary.  So it's easier to spot,
> diagnose, and correct corruption bugs.

Right, no argument from me on this, specifically. I'm a bit skeptical as to 
how useful this capability might turn out to be in real usage.

> > In the final analysis, though, why mention a putative "major"
> > problem without explicitly mentioning a solution? I'm curious.
>
> I didn't mention a solution because I don't have one.  That doesn't mean
> I can't see a big whacking problem when it stares me in the face -- in
> fact, I'm embarrassed that it took Larry McVoy to bring my attention to it.
>
> As Donald Knuth said "Premature optimization is the root of all evil."
> Binary formats are almost always premature optimization -- they sacrifice
> debuggability (and, hence, development time) for performance gains that
> are usually marginal.  They should be used sparingly, and usually only
> as automatically-regenerated caches or fast indexes for text masters.

We may be talking at cross-purposes here. It sounds like you are mainly 
anti-binary-format for a specific usage, and I took your original post to 
indicate an across-the-table dislike for binary. Of course, you might 
actually have an across-the-board dislike for binary, in which case we will 
respectfully disagree. :-)

Sorry for any hints of cynicism in my earlier post. I am well into XML 
overload - 3 years ago I loved it; now I tolerate it. :-) Still, your above 
idea sounds pretty sweet.

Regards,
Arved Sandstrom

Re: choice of DB

Posted by Greg Stein <gs...@lyra.org>.

On Tue, Apr 03, 2001 at 07:20:08AM +0000, Tripp Lilley wrote:
> On Tue, 3 Apr 2001, Greg Stein wrote:
> > In this case, it isn't a premature optimization. We know that data access
> > will be a large issue. Second, the choice of binary vs text was independent
> > of our main decision point: programmer productivity. Why implement a (text)
> > database, when we already have a database ready and waiting for us?
>...
> Is svn storing the plaintext head revision and plaintext deltas "in the
> filesystem", or "in the DB"?

Everything in the DB (under the API of the "SVN filesystem (FS)").

> Again calling on my experience with Perforce,
> I know that they store the -source- itself in RCS form reverse-delta
> plaintext files. They then store the -metadata- in "some binary format".
> Should the metadata become corrupt, you will not have the entire
> integration / submission / etc., history, but you will at least have the
> files themselves, and the individual revision history of the files (which
> is obviously better than zilch, and usually sufficient relief for anyone
> suffering a catastrophic failure in the absence of regular backups).

Depending on what gets hit, it could be a minor impact, or it could be a
large impact. But we are not designing for disk failure or filesystem
corruption. I relegate that problem and its resolutions to the operating
system. There is only so much you can defend against before you simply toss
out the operating system and code your own server in assembly. Oh, and toss
out the BIOS and the firmware on your hard drives. Oh, and make sure you use
redundancy across your DIMMs to protect against bad DIMMs. And ... ;-)

The counter to my extreme example is simply that you have to trust
*something* in all the software that builds up your system. People just like
to draw the lines of trust in different areas. The SVN team has chosen to
trust Apache, APR, Neon, Expat, and DB. Those are five *huge* pieces of
code, which is why SVN is relatively small for what it accomplishes.

[ a line count shows 45 kloc ]

> If you're at present storing -everything- in the DB, it might be at least
> worthwhile to investigate such separation as described above for "a future
> release".

Subversion 2.0 will have a pluggable database backend. My two personal goals
for 2.0 are the database and (better) WebDAV/DeltaV compatibility. I'll use
any veto power I can to ensure that 2.0 doesn't go out the door without
those features :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: choice of DB (was: Re: I'll help with db debugging)

Posted by Tripp Lilley <tl...@perspex.com>.

On Tue, 3 Apr 2001, Greg Stein wrote:

> In this case, it isn't a premature optimization. We know that data access
> will be a large issue. Second, the choice of binary vs text was independent
> of our main decision point: programmer productivity. Why implement a (text)
> database, when we already have a database ready and waiting for us?

I preface my comments by reminding you that I've been too busy reading the
list to read the design docs :) So if you're already doing this... you
know.

Is svn storing the plaintext head revision and plaintext deltas "in the
filesystem", or "in the DB"? Again calling on my experience with Perforce,
I know that they store the -source- itself in RCS form reverse-delta
plaintext files. They then store the -metadata- in "some binary format".
Should the metadata become corrupt, you will not have the entire
integration / submission / etc., history, but you will at least have the
files themselves, and the individual revision history of the files (which
is obviously better than zilch, and usually sufficient relief for anyone
suffering a catastrophic failure in the absence of regular backups).

If you're at present storing -everything- in the DB, it might be at least
worthwhile to investigate such separation as described above for "a future
release".

-- 
   Joy-Loving * Tripp Lilley  *  http://stargate.eheart.sg505.net/~tlilley/
------------------------------------------------------------------------------
  "Fiber makes you poop." -- From <http://www.pvponline.com/bts_studio.php3>

choice of DB (was: Re: I'll help with db debugging)

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Apr 02, 2001 at 08:42:05PM -0400, Eric S. Raymond wrote:
>...
> As Donald Knuth said "Premature optimization is the root of all evil."
> Binary formats are almost always premature optimization -- they sacrifice
> debuggability (and, hence, development time) for performance gains that
> are usually marginal.  They should be used sparingly, and usually only
> as automatically-regenerated caches or fast indexes for text masters.

In this case, it isn't a premature optimization. We know that data access
will be a large issue. Second, the choice of binary vs text was independent
of our main decision point: programmer productivity. Why implement a (text)
database, when we already have a database ready and waiting for us?

I'll take DB any day if it means we can deliver a working system faster than
if we had to roll our own. I'll also take it if we can have more confidence
in our end result. If somebody asks me about DB-inspired data corruption,
I'll laugh. If we rolled our own, I'd be very concerned, for a long while.
If we rolled our own, we'd definitely need text because there'd be all kinds
of problems in it that we'd need to find.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: I'll help with db debugging

Posted by "Eric S. Raymond" <es...@thyrsus.com>.

Arved Sandstrom <Ar...@chebucto.ns.ca>:
> Couldn't help throwing in my 1.2 cents Canadian on this one,
> though. To start with, exactly what is the magical (transparent
> non-binary) format that will make it _easy_ to detect corruption and
> recover from same? I can throw out a few guesses, but I won't.

I don't know what the format should be.  If I were designing it, it
would probably be some sort of XML with auxiliary binary
nane-to-offset indices that get regenerated on the fly whenever the
text data part is newer than the index.  That way you get both the
speed advantages of a binary format and the transparancy advantages of
text.

> Number two, a team that's prone to writing code that garbages up a DB is 
> going to be prone to writing code that garbages up a text (non-binary) 
> format.

True, but not the point.  The point is that it's a lot easier for human 
eyeballs to grok patterns in text than in binary.  So it's easier to spot,
diagnose, and correct corruption bugs. 

> In the final analysis, though, why mention a putative "major"
> problem without explicitly mentioning a solution? I'm curious.

I didn't mention a solution because I don't have one.  That doesn't mean
I can't see a big whacking problem when it stares me in the face -- in fact,
I'm embarrassed that it took Larry McVoy to bring my attention to it.

As Donald Knuth said "Premature optimization is the root of all evil."
Binary formats are almost always premature optimization -- they sacrifice
debuggability (and, hence, development time) for performance gains that
are usually marginal.  They should be used sparingly, and usually only
as automatically-regenerated caches or fast indexes for text masters.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Non-cooperation with evil is as much a duty as cooperation with good.
	-- Mohandas Gandhi

Re: I'll help with db debugging

Posted by Arved Sandstrom <Ar...@chebucto.ns.ca>.

On Friday 05 January 2001 03:54, Eric S. Raymond wrote:
> Chip Salzenberg <ch...@valinux.com>:
> > I'll be glad to help with debugging the db crash.  I've been wanting
> > to get into subversion for a while now, and there's no place better
> > for a newbie to start than with the ugly grotty stuff that no one
> > wants to do....
>
> Larry McVoy told me this weekend at the Kernel Hackers' Summit that
> Subversion is using db, a non-transparent binary format, to store critical
> state.  He also said "There was cheering and shouting in the halls at
> Bitkeeper when we heard that news.  Those <deleted>s just added a year
> to their development time."
>
> Larry is right.  This choice was a major, major blunder that fills me
> with unease about the future of this project.  What little you gain in
> performance you will lose in multiplied difficulties and schedule slips
> because corruption will be so much harder to detect and recover from
>
> Please reverse this bad choice *now*, before you get nibbled to death and
> bogged down in db-related problems.

I'm just monitoring this list because I picked up on it as a good example of 
open-source process; it's giving me useful ideas elsewhere. I am otherwise 
uninvolved, other than being keen on seeing Subversion succeed.

Couldn't help throwing in my 1.2 cents Canadian on this one, though. To start 
with, exactly what is the magical (transparent non-binary) format that will 
make it _easy_ to detect corruption and recover from same? I can throw out a 
few guesses, but I won't.

Number two, a team that's prone to writing code that garbages up a DB is 
going to be prone to writing code that garbages up a text (non-binary) 
format. For all but a very few situations I can tell you what my remedial 
action would be in both cases - go to the repository backup, and rollback the 
Subversion repository to a known state. Just like with any other source 
control system. At some time in the future when I am fortunate enough to do 
CM with Subversion I can assure everyone that that backup and recovery 
strategy will not change.

I don't think either binary formats or typical DBMSs are that shaky. I also 
don't think text formats are that robust, not when they are essentially 
read-write data. I've dealt with text data for a long time - my background is 
scientific data processing up until '96 or so, and I can assure you that it's 
just as easy to corrupt text as it is to corrupt binary. I'll grant that the 
garbage that results is human-readable in the one case, and not in the other.

In the final analysis, though, why mention a putative "major" problem without 
explicitly mentioning a solution? I'm curious.

Regards,
Arved Sandstrom

Re: I'll help with db debugging

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

"Eric S. Raymond" <es...@thyrsus.com> writes:
> Chip Salzenberg <ch...@valinux.com>:
> > I'll be glad to help with debugging the db crash.  I've been wanting
> > to get into subversion for a while now, and there's no place better
> > for a newbie to start than with the ugly grotty stuff that no one
> > wants to do....
> 
> Larry McVoy told me this weekend at the Kernel Hackers' Summit that
> Subversion is using db, a non-transparent binary format, to store critical
> state.  He also said "There was cheering and shouting in the halls at 
> Bitkeeper when we heard that news.  Those <deleted>s just added a year
> to their development time."
> 
> Larry is right.  This choice was a major, major blunder that fills me
> with unease about the future of this project.  What little you gain in
> performance you will lose in multiplied difficulties and schedule slips
> because corruption will be so much harder to detect and recover from
> 
> Please reverse this bad choice *now*, before you get nibbled to death and
> bogged down in db-related problems.

Yes, we're using Berkeley DB for the repository storage.  However, all
the keys and values are stored in a processor-independent,
human-readable format.  So the repository can easily be dumped and
examined in any text editor using the stock Berkeley DB table dump
facility.  In fact, Berkeley DB includes a program which acts as the
inverse of the dumping program, and can recreate a database from the
human-readable form.

I think Berkeley DB was a good choice as a back end.  It is mature
software; it provides atomic transactions, crash recovery, and hot
backups; and it has a reputation for being efficient.

The biggest problem with Berkeley DB is that the database format has
historically changed frequently.  To deal with that, we're planning to
write some very simple table dumping programs, which use no system
calls other than `seek' and `read', and simply keep an archive of
programs that can read the tables produced by every version of
Berkeley DB we've ever used.  (Perhaps the Sleepycat people will help
us with this.)

As has been said before, I'm flattered that Bitkeeper considers us
such a threat.  And I'm sorry that you feel we've made a poor choice.
And as always, the code is yours to improve.

NeXTSTEP objects (was: Re: I'll help with db debugging)

Posted by Greg Stein <gs...@lyra.org>.

On Thu, Apr 05, 2001 at 02:16:19PM -0400, Garance A Drosihn wrote:
>...
> For what it's worth, in the days of NeXTSTEP, objects were
> stored in things called "NIBs".  These were binary files for
> efficiency reasons, but I have the impression that many
> developers felt they would have been better as plain-text
> files of some format.

Damn straight they would have been nice in plain text. While you're working
on the code, it would be nice to see what the heck got serialized for your
object. While you object changes during development, etc, you want to
inspect those things, tweak them, whatever.

Let's not mix apples and oranges, though. We've got our data serialized in a
*text* format, but stored into a database where the metadata happens to be
binary. We have the flexibility of text and the data structures are quite
flexible/adaptable (meaning we can easily and backwards-compatibly change
the structure that is serialized). Further, our use case and application is
a bit different than NeXTTEP objects.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: I'll help with db debugging

Posted by "Deven T. Corzine" <de...@ties.org>.

On Thu, 5 Apr 2001, Garance A Drosihn wrote:

> For what it's worth, in the days of NeXTSTEP, objects were
> stored in things called "NIBs".  These were binary files for
> efficiency reasons, but I have the impression that many
> developers felt they would have been better as plain-text
> files of some format.
> 
> Going with binary formats makes more sense if you know you
> will be isolated to a specific architecture, but if these
> files are going to be the same across many architectures
> (little-endian vs big-endian, 32bit vs 64bit, etc), then
> the program needs to be careful when reading/writing that
> binary info anyway.  So, going to some plain-text format
> isn't all that much more of a performance hit.

Hi Garance!  (I thought your last name was Drosehn, as in your signature?
Why does your email address seem to be based on a spelling of "Drosihn"?)

Yeah, that's a good point.  The performance hit may not be that bad, or at
least not necessarily so.  Simple text-based formats (e.g. RFC822 headers)
can be easily parsed -- complex ones like XML may take a more noticable
performance hit.  But I'll agree that the performance hit isn't outrageous.

There remains an efficiency question, in terms of space used.  While it may
be complex to handle the endianness and wordsize issues, binary formats are
almost invariably more compact.  Of course, they tend to be less flexible
and (of course) unreadable, so there's clearly a tradeoff.  I'm not sure if
space efficiency should be ignored as a factor, especially a system like
Subversion that's designed as a repository for data.  (Although the obvious
counterargument is that memory and disk keep getting cheaper -- who cares?)

Another possibility might be to use a text-based format, but use Zlib to
compress the files (in gzip-compatible format) for space efficiency in
storage -- this could provide the transparency and convenience of a text
format with the space efficiency of a binary format.  Of course, that would
come at an even greater performance cost...

Deven

Re: I'll help with db debugging

Posted by Garance A Drosihn <dr...@rpi.edu>.

At 12:01 PM -0400 4/5/01, Deven T. Corzine wrote:
>On Wed, 4 Apr 2001, Chip Salzenberg wrote:
>
>  > According to Deven T. Corzine:
>  > > I see the value in having a readable format, but is
>  > > it necessarily preferable as the native format?
>  >
>  > I think so.  I like having all my normal tools (find, grep,
>  > etc.) work, without a lot of translation rigamarole.
>
>Text-based formats definitely have a convenience factor and
>a transparency that's nice, but you're also taking a
>performance/efficiency hit, to some degree or another.  Is it
>possible to have your cake and eat it too?

For what it's worth, in the days of NeXTSTEP, objects were
stored in things called "NIBs".  These were binary files for
efficiency reasons, but I have the impression that many
developers felt they would have been better as plain-text
files of some format.

Going with binary formats makes more sense if you know you
will be isolated to a specific architecture, but if these
files are going to be the same across many architectures
(little-endian vs big-endian, 32bit vs 64bit, etc), then
the program needs to be careful when reading/writing that
binary info anyway.  So, going to some plain-text format
isn't all that much more of a performance hit.

-- 
Garance Alistair Drosehn            =   gad@eclipse.acs.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu

Re: I'll help with db debugging (and Introduction)

Posted by "Deven T. Corzine" <de...@ties.org>.

On Thu, 5 Apr 2001, Jonathan Leffler wrote:

> However, if the storage manager were designed to handle it, then it
> would be possible to have a readable text representation of the data and
> a not necessarily readable fast-access mechanism.  Which I think counts
> as having your cake and eating it in this context.

Yes, this is an obvious possibility, and someone else suggested a similar
idea, making an index file for the text-based format.  Perhaps that will be
the best approach in the end; I don't know.  Of course, you'd want to be
able to re-index the text-based data file in case you change it by hand.

You can at least get the performance of a binary-only solution this way,
but the space efficiency isn't the same if the text format is verbose...

Deven

Re: I'll help with db debugging (and Introduction)

Posted by Jonathan Leffler <jl...@informix.com>.

On Thu, 5 Apr 2001, Deven T. Corzine wrote:
>On Wed, 4 Apr 2001, Chip Salzenberg wrote:
>> According to Deven T. Corzine:
>> > On Sun, 1 Apr 2001, Chip Salzenberg wrote:
>> > > OK, I'm going to fix the db problem by eliminating db.  :-)
>> >
>> > If the idea is to have a text-based format that's more transparent so that
>> > one can identify corruption visually, wouldn't it work as well to have
>> > mechanisms to dump the DB database into one or more such formats?  I see
>> > the value in having a readable format, but is it necessarily preferable as
>> > the native format?
>>
>> I think so.  I like having all my normal tools (find, grep, etc.) work,
>> without a lot of translation rigamarole.
>
>I understand that, and there's certainly something to be said for it.  At
>the same time, if you're needing to fall back on general-purpose tools like
>that, does that suggest that the application-specific tools are lacking?
>
>Text-based formats definitely have a convenience factor and a transparency
>that's nice, but you're also taking a performance/efficiency hit, to some
>degree or another.  Is it possible to have your cake and eat it too?

Hi,

I should introduce myself, and then I have some comments.

                       ===== Introduction =====

I've been lurking on the list for, oh, 3 days now.  I work for Informix
Software, and one of the areas I deal with is Open Source software.
Specifically, I look after the DBD::Informix module that works with Perl
and DBI.  I also tend to get involved in other database related open
source projects (PHP, Tcl/Tk, Python, ...).  I'm afraid I'm only likely
to be able to lurk on this group rather than contributing much more than
the occasional idea or gob of information.  If it is relevant, I can
test on Sparc Solaris (currently 7) and/or Linux (currently RH6.2)
without having to negotiate with anyone, and I can sometimes find more
obscure machines to work on -- ask and I'll see what can be done.

                        ===== Commentary =====

I'm not sure how Berkeley DB handles this stuff, but if you were using
Informix C-ISAM as the data access mechanism, then you could have your
cake and eat it.  Specifically, C-ISAM stores the data in a .dat file,
and each record has a 'currency marker' at the end.  Active records have
a newline '\n' and deleted records have a NUL '\0'.  The access
information -- indexes -- are stored in a separate .idx file, and
contain lots of binary information.  If you (a) chose to use character
representation for each field and (b) you were able to stick with fixed
length records, then you could have your cake and eat it -- the .dat
files would be legible (the '\0' would be a mild nuisance, but SVN would
seldom be deleting records), and the .idx files would not need to be
viewed.  I suspect (b) is too stringent a requirement; even though
C-ISAM supports variable length records, you cannot index the variable
length portion of the records, and the variable length data is stored in
the index file, not the data file (don't ask why - I don't know; and it
stinks).  More seriously, C-ISAM is a commercial product, which rules it
out from this project.

However, if the storage manager were designed to handle it, then it
would be possible to have a readable text representation of the data and
a not necessarily readable fast-access mechanism.  Which I think counts
as having your cake and eating it in this context.

-- 
Yours,
Jonathan Leffler (Jonathan.Leffler@Informix.com) #include <disclaimer.h>
Guardian of DBD::Informix v1.00.PC1 -- http://www.perl.com/CPAN
     "I don't suffer from insanity; I enjoy every minute of it!"

Re: I'll help with db debugging

Posted by "Deven T. Corzine" <de...@ties.org>.

On Wed, 4 Apr 2001, Chip Salzenberg wrote:

> According to Deven T. Corzine:
> > On Sun, 1 Apr 2001, Chip Salzenberg wrote:
> > > OK, I'm going to fix the db problem by eliminating db.  :-)
> > 
> > If the idea is to have a text-based format that's more transparent so that
> > one can identify corruption visually, wouldn't it work as well to have
> > mechanisms to dump the DB database into one or more such formats?  I see
> > the value in having a readable format, but is it necessarily preferable as
> > the native format?
> 
> I think so.  I like having all my normal tools (find, grep, etc.) work,
> without a lot of translation rigamarole.

I understand that, and there's certainly something to be said for it.  At
the same time, if you're needing to fall back on general-purpose tools like
that, does that suggest that the application-specific tools are lacking?

Text-based formats definitely have a convenience factor and a transparency
that's nice, but you're also taking a performance/efficiency hit, to some
degree or another.  Is it possible to have your cake and eat it too?

Deven

Re: I'll help with db debugging

Posted by Chip Salzenberg <ch...@valinux.com>.

According to Deven T. Corzine:
> On Sun, 1 Apr 2001, Chip Salzenberg wrote:
> > OK, I'm going to fix the db problem by eliminating db.  :-)
> 
> If the idea is to have a text-based format that's more transparent so that
> one can identify corruption visually, wouldn't it work as well to have
> mechanisms to dump the DB database into one or more such formats?  I see
> the value in having a readable format, but is it necessarily preferable as
> the native format?

I think so.  I like having all my normal tools (find, grep, etc.) work,
without a lot of translation rigamarole.
-- 
Chip Salzenberg              - a.k.a. -             <ch...@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

Re: I'll help with db debugging

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

Greg Stein <gs...@lyra.org> writes:
> After 1.0, I will *help* with replacing the DB backend. I'd like to see a
> SQL backend in there. Some others want a pure-text backend. It should all be
> possible. Our interfaces between the FS core and the databases feels pretty
> trim at the moment, but we'll just have to see. (I believe we need to make
> skel's a DB-specific thing, which means the impact could actually be pretty
> large)

Right.  I think the main value of using Berkeley DB is that it will
get us going faster.  There's a bunch of stuff that is hard to get
right that we simply didn't have to worry about (atomicity,
recoverability, hot backups).  If someone wants to implement a
pure-text back end that performs as well and has these characteristics
--- for real, not just "well, it's never crashed on me..." --- that
would be great.  The idea behind keeping the FS interface simple is to
allow exactly this sort of replacement.

Re: I'll help with db debugging

Posted by Chip Salzenberg <ch...@valinux.com>.

According to Greg Stein:
> I totally support replacing db for a post-1.0 version of Subversion.
> Or as a user-supplied patch / alternative.  But we will never get a
> 1.0 shipped, any time soon, if we implement our own storage system.
> It just won't happen.

I wouldn't want to hold up a 1.0 release over db.  But if a pure text
backend were ready in time ... well, let's just see if we can do that.

> But as Jim points out: please feel free. The software is definitely
> open for change. I would also support (for the 1.0 release itself)
> any kind of refactoring or introduction of APIs to enhance/improve
> pluggability of a backend, assuming they aren't too costly to put in.

That's all I could ask for.  Thank you.

> And lastly: welcome Chip! We met back in December when Dick Hardt
> and I came to visit at VA Linux, but I never figured that we'd be
> running into each other code-wise :-)

Ah, yes.  I didn't connect the name, but I remember you; rehi!
Hm.  I remember December, 2000.  "Dot com" wasn't an epithet....
-- 
Chip Salzenberg              - a.k.a. -             <ch...@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech

Re: I'll help with db debugging

Posted by Greg Stein <gs...@lyra.org>.

On Sun, Apr 01, 2001 at 11:00:46PM -0700, Chip Salzenberg wrote:
> According to Eric S. Raymond:
> > What little you gain [from db] in performance you will lose in
> > multiplied difficulties and schedule slips because corruption will
> > be so much harder to detect and recover from.
> 
> Now that you put it that way, I have to agree.  Compare rpm's binary
> database with Debian's nice, readable directory tree + text database.
> Debian wins, hands down, in quality of implementation.
> 
> OK, I'm going to fix the db problem by eliminating db.  :-)

Kind of a reset here: please recognize that I believe we are seeing a crash
because we're doing something wrong. Not DB. It happens to occur in DB, but
that doesn't mean it is DB's fault. And we aren't seeing any data corruption
(for myself, the crash occurs during the Apache request cleanup phase, when
we toss memory; I believe it is a double-free somewhere)

Second: I totally support replacing db for a post-1.0 version of Subversion.
Or as a user-supplied patch / alternative. But we will never get a 1.0
shipped, any time soon, if we implement our own storage system. It just
won't happen.

After 1.0, I will *help* with replacing the DB backend. I'd like to see a
SQL backend in there. Some others want a pure-text backend. It should all be
possible. Our interfaces between the FS core and the databases feels pretty
trim at the moment, but we'll just have to see. (I believe we need to make
skel's a DB-specific thing, which means the impact could actually be pretty
large)

But as Jim points out: please feel free. The software is definitely open for
change. I would also support (for the 1.0 release itself) any kind of
refactoring or introduction of APIs to enhance/improve pluggability of a
backend, assuming they aren't too costly to put in.

And lastly: welcome Chip! We met back in December when Dick Hardt and I came
to visit at VA Linux, but I never figured that we'd be running into each
other code-wise :-) Glad to have you here!

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: I'll help with db debugging

Posted by "Deven T. Corzine" <de...@ties.org>.

On Sun, 1 Apr 2001, Chip Salzenberg wrote:

> According to Eric S. Raymond:
> > What little you gain [from db] in performance you will lose in
> > multiplied difficulties and schedule slips because corruption will
> > be so much harder to detect and recover from.
> 
> Now that you put it that way, I have to agree.  Compare rpm's binary
> database with Debian's nice, readable directory tree + text database.
> Debian wins, hands down, in quality of implementation.
> 
> OK, I'm going to fix the db problem by eliminating db.  :-)

If the idea is to have a text-based format that's more transparent so that
one can identify corruption visually, wouldn't it work as well to have
mechanisms to dump the DB database into one or more such formats?  I see
the value in having a readable format, but is it necessarily preferable as
the native format?

Deven

Re: I'll help with db debugging

Posted by Chip Salzenberg <ch...@valinux.com>.

According to Eric S. Raymond:
> What little you gain [from db] in performance you will lose in
> multiplied difficulties and schedule slips because corruption will
> be so much harder to detect and recover from.

Now that you put it that way, I have to agree.  Compare rpm's binary
database with Debian's nice, readable directory tree + text database.
Debian wins, hands down, in quality of implementation.

OK, I'm going to fix the db problem by eliminating db.  :-)
-- 
Chip Salzenberg              - a.k.a. -             <ch...@valinux.com>
 "We have no fuel on board, plus or minus 8 kilograms."  -- NEAR tech