You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Karl Fogel <kf...@red-bean.com> on 2008/08/17 13:11:40 UTC

Re: svn commit: r32461 - trunk/notes

gstein@tigris.org writes:
> --- trunk/notes/wc-ng-design	Wed Aug 13 13:30:12 2008	(r32460)
> +++ trunk/notes/wc-ng-design	Wed Aug 13 13:42:30 2008	(r32461)
> @@ -306,8 +319,12 @@ There are 3 known models for storing met
>  groups of users:
>  
>   - in-subtree metadata storage (.svn subdir model, as in wc-1.0)
> +   ###GJS: euh... aren't we axing this? who has *requested* this?
>   - in-'tree root' metadata storage (working copy central)
>   - detached metadata storage (user-central)

Yeah, I also had thought we were just going to stop supporting that
model, except in non-upgrade situations (i.e., okay, you can keep your
old-style wc, but then you don't get any of the shiny new features).

> +Basic Storage Mechanics
> +-----------------------
> +
> +All metadata will be stored into a single SQLite database. This
> +includes all of the "entry" fields *and* all of the properties
> +attached to the files/directories. SQLite transactions will be used
> +rather than the "loggy" mechanics of wc-1.0.

Some kind of logginess, or at least carefully ordered actions, will
still be needed, since we've got working files to deal with.  But yeah,
I think the logginess will get a lot simpler with SQLite transactions.

> +Base text data will be stored in a multi-level directory structure,
> +keyed/named by the MD5 of the file. The database will record
> +appropriate mappings, content/compression types, and refcounts for the
> +base text files (for the shared case).

Sounds sane; we ought to be doing the same in the repository, really
(http://subversion.tigris.org/issues/show_bug.cgi?id=2286) :-).

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn commit: r32461 - trunk/notes

Posted by Branko Čibej <br...@xbc.nu>.
Karl Fogel wrote:
> Branko Čibej <br...@xbc.nu> writes:
>   
>> I'd been wondering about sqlite performance in this context. I have
>> not a single data point to anchor my worries to, there's just this
>> nagging guilt about the BDB mistake we made.
>>     
>
> I've nothing against revlog format, but I really think our issue with
> BDB was over-normalized data: a failure to consider the costs of all the
> parsing and dereferencing involved in our
> extremely-comprehensible-but-er-not-so-extremely-performant schema of
> representations/entries/props/whatnot expressed as skels.
>
> Neither BDB nor SQLite by themselves need mean performance problems.
> The issue is how we use them.
>   

That is arguably the case ... though I do have some unrelated-to-svn 
data that suggest BDB can stab you in the back performance-wise quite 
unexpectedly.

That said, we could've done a much better job at configuring BDB for 
performance in our repos. Water under the bridge ... and too many knobs 
to turn, at that.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn commit: r32461 - trunk/notes

Posted by Karl Fogel <kf...@red-bean.com>.
Branko Čibej <br...@xbc.nu> writes:
> I'd been wondering about sqlite performance in this context. I have
> not a single data point to anchor my worries to, there's just this
> nagging guilt about the BDB mistake we made.

I've nothing against revlog format, but I really think our issue with
BDB was over-normalized data: a failure to consider the costs of all the
parsing and dereferencing involved in our
extremely-comprehensible-but-er-not-so-extremely-performant schema of
representations/entries/props/whatnot expressed as skels.

Neither BDB nor SQLite by themselves need mean performance problems.
The issue is how we use them.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svn commit: r32461 - trunk/notes

Posted by Branko Čibej <br...@xbc.nu>.
Greg Stein wrote:
>>> Sounds sane; we ought to be doing the same in the repository, really
>>> (http://subversion.tigris.org/issues/show_bug.cgi?id=2286) :-).
>>>       
>> Despite having been a fan of content indexing in the past, these days I
>> think a revlog-like structure would be a better idea. I learned a thing or
>> two about the goodness of denormalized datastores in the meantime ... not
>> that denormalization is always a good thing.
>>     
>
> Are you talking revlog on the repository, or for the text-bases on the
> client (my original point) ?
>
> For the repository: I agree, and want to work on that post-WC effort.
> For the client: I don't understand the benefit... ?
>   

I meant for the repository, sorry I didn't make that clearer.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: svn commit: r32461 - trunk/notes

Posted by Greg Stein <gs...@gmail.com>.
On Sun, Aug 17, 2008 at 12:10 PM, Branko Čibej <br...@xbc.nu> wrote:
>...
>> gstein@tigris.org writes:
>...
>>> +All metadata will be stored into a single SQLite database. This
>>> +includes all of the "entry" fields *and* all of the properties
>>> +attached to the files/directories. SQLite transactions will be used
>>> +rather than the "loggy" mechanics of wc-1.0.
>...
> I'd been wondering about sqlite performance in this context. I have not a
> single data point to anchor my worries to, there's just this nagging guilt
> about the BDB mistake we made.

I do have data points that suggest 100ms per commit. That was on one
particular box, and now that I think about it, it was on an NFS mount.
But if you take a look at some of my other notes, I plan to put
timing/debugging code in there to see what happens. Some algorithms
may need to change to reduce the number of commits, but the main
points is that I'll be keeping an eye on it.

>>> +Base text data will be stored in a multi-level directory structure,
>>> +keyed/named by the MD5 of the file. The database will record
>>> +appropriate mappings, content/compression types, and refcounts for the
>>> +base text files (for the shared case).
>>
>> Sounds sane; we ought to be doing the same in the repository, really
>> (http://subversion.tigris.org/issues/show_bug.cgi?id=2286) :-).
>
> Despite having been a fan of content indexing in the past, these days I
> think a revlog-like structure would be a better idea. I learned a thing or
> two about the goodness of denormalized datastores in the meantime ... not
> that denormalization is always a good thing.

Are you talking revlog on the repository, or for the text-bases on the
client (my original point) ?

For the repository: I agree, and want to work on that post-WC effort.
For the client: I don't understand the benefit... ?

Thanks,
-g

Re: svn commit: r32461 - trunk/notes

Posted by Branko Čibej <br...@xbc.nu>.
Karl Fogel wrote:
> gstein@tigris.org writes:
>   
>> +Basic Storage Mechanics
>> +-----------------------
>> +
>> +All metadata will be stored into a single SQLite database. This
>> +includes all of the "entry" fields *and* all of the properties
>> +attached to the files/directories. SQLite transactions will be used
>> +rather than the "loggy" mechanics of wc-1.0.
>>     
>
> Some kind of logginess, or at least carefully ordered actions, will
> still be needed, since we've got working files to deal with.  But yeah,
> I think the logginess will get a lot simpler with SQLite transactions.
>   

I'd been wondering about sqlite performance in this context. I have not 
a single data point to anchor my worries to, there's just this nagging 
guilt about the BDB mistake we made.

>> +Base text data will be stored in a multi-level directory structure,
>> +keyed/named by the MD5 of the file. The database will record
>> +appropriate mappings, content/compression types, and refcounts for the
>> +base text files (for the shared case).
>>     
>
> Sounds sane; we ought to be doing the same in the repository, really
> (http://subversion.tigris.org/issues/show_bug.cgi?id=2286) :-).
>   

Despite having been a fan of content indexing in the past, these days I 
think a revlog-like structure would be a better idea. I learned a thing 
or two about the goodness of denormalized datastores in the meantime ... 
not that denormalization is always a good thing.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org