You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Branko Čibej <br...@xbc.nu> on 2004/05/28 06:51:06 UTC

Path lookup table (was: Re: Locking design (was Re: svn commit: r9885 - trunk/notes))

Greg Hudson wrote:

>(Incidentally, repeating yourself over and over again on this point,
>without presenting any new arguments, as you did in
><40...@xbc.nu> in response to Ben, isn't really helping your
>argument.)
>  
>
As usual, I keep forgetting people can't read my mind, and I'm drawing 
on my experience in implementing "real" filesystems, which is probably 
not common... This explanation could get a bit longish.

I'm looking at the lock lookup table as a transition path towards fixing 
one of the deficiencies of the current FS schema (at least the 
database-backend version), and that's inefficient representation of the 
revision/path index.

In a filesystem -- Subversion's FS is no exception -- most operations 
can be broken down into two stages: a) resolve the path into a 
filesystem object (node revision), and b) do your stuff with that 
object. All access checks, symlink translation, etc. happen during the 
first phase, which is usually the most time-critical part of any 
filesystem implementation.

In the current BDB schema, directory contents are represented as 
variable-sized, unsorted list of entries. In order to find a name in the 
directory, the FS has to read the whole list and scan it sequentially, 
which means that the cost of path lookups is proportional to directory 
size in both time and memory. It also means that we have to write a 
complete new copy of the entries list if a single entry changes, which 
uses up a fair amount of space in the repository.

Incidentally, many "real" filesystems make the same mistake, most 
notably older Unix filesystems. Modern FS's (e.g., NTFS, ReiserFS, XFS) 
use some sort of ordering/indexing to make each lookup cost O(log N) 
time (NTFS and XFS use a B-tree). Another common way to speed up the 
first phase is to put all the information needed for the lookup, access 
control, etc. into the same index, increasing locality of reference. 
NTFS, for example, puts directory entries, ACLs, and even the contents 
of small files into a single B-tree.

In the near future, we'll be adding locking and later ACLs to 
Subversion. Both features require a path->object lookup. Assuming we 
have directory locks, both require a cheap way to propagate inherited 
values. Another feature that's on our list somewhere are unversioned 
properties, perhaps even unversioned files(!). All of these could use a 
single path-based lookup index. My vision is to slowly evolve this index 
until the point where, by 2.0, it can replace the current directory 
entries list.

Of course, life wouldn't be interesting if this index could be a simple 
path->info mapping. BDB supports variable-length keys, but table-based 
databases (which we want to support in future) are notoriously cranky 
about rejecting such, and they're not very efficient in BDB, either. So 
the structure of the index would have look like this:

    (dir-id, name, [txn-id]) -> info

The txn-id is needed for ACLs, for example, because the same path can 
have different ACLs in different revisions. The lookup would take a 
reference txn id (e.g., resolved from a revision number) and do a "find 
first smaller than" type of key search. HEAD would have to be 
represented by a magic value; locks are always in HEAD.

The dir-id is a unique directory identifier that sets the context for 
the name. We may be able to use  a (node-id, copy-id) pair for that; I 
haven't thought through all the ramifications. If not, we'll have to 
generate a unique directory ID for this index, which is not so nice.

The info contains the bits and pieces attached to the path:

    * lock object, ACL, unversioned prop,... for this entry
    * If the entry represents a directory:
          o the dir-id of this directory, used by the next level of lookup
          o a flag indicating whether the lock/acl/prop/... is inheritable
          o maybe a flag if a lock exists anywhere _beneath_ this directory

The last item could be an optimisation for recursive directory locking: 
if this flag is set, then a recursive lock on the directory would 
conflict with an existing lock somewhere beneath it (yes, even if you 
own the lock). Maintaining this flag is a bit tricky, but probably 
cheaper than scanning the whole subtree for possible lock conflicts. It 
might require a few tweaks in the structure of the index to get an 
optimal implementation, though.

You'll notice that I didn't spend too much time worrying about how this 
design scales from locks/acls -- which are expected to be fairly 
"sparse" in the tree -- to a full-blown directory index. There are some 
tricky questions to answer; for example, I'm still not sure how to avoid 
having to write a zillion index entries every time a directory changes, 
either directly or through bubble-up; or how to keep the constant-time 
nature of directory copies. It might be possible to even avoid 
"physical" bubble-up completely. Anyway, these ruminations need a lot of 
evolving first.

There. I've put my brane on display. Hope you enjoy mucking through it. :-)

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Path lookup table (was: Re: Locking design (was Re: svn commit: r9885 - trunk/notes))

Posted by Alon Ziv <al...@nolaviz.org>.
On Friday 28 May 2004 09:51, Branko Čibej wrote:
> [...]
> the structure of the index would have look like this:
>
>     (dir-id, name, [txn-id]) -> info
>
> [...]
>
> The info contains the bits and pieces attached to the path:
>
>     * lock object, ACL, unversioned prop,... for this entry
>     * If the entry represents a directory:
>           o the dir-id of this directory, used by the next level of lookup
>           o a flag indicating whether the lock/acl/prop/... is inheritable
>           o maybe a flag if a lock exists anywhere _beneath_ this directory
>

This is very similar to a structure I've been thinking about lately -- used to 
store copy source information. In fact, the only difference is that my vesion 
used a revision number instead of txn-id (as I want to be able to access them 
sequentially).
The method for using this information is just to add a copy-to row for every 
copy ever made. When computing the response to a log request, we would also 
merge the copy-to information of the directories along the path to the 
requested node with the actual changes in the node.
So, we can consider copy-source access as another possible use/benefit of this 
table... Thus implementing another oft-requested feature.

Of course, I'm ignoring the fs_fs implementation right now; so far, I haven't 
thought of any efficient representation for this information in plain files 
(not to mention safety guarantees and the like).

	-az

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Path lookup table (was: Re: Locking design (was Re: svn commit: r9885 - trunk/notes))

Posted by "C. Michael Pilato" <cm...@collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:

> I think it's important to answer these questions before we look at a
> table of this nature as a future direction for FS directory lookups.  I
> suspect the original FS designers did consider using a separate DB
> association for each directory entry, and decided it would be too
> inefficient.

Please don't give us that much credit.  

The original FS schema had the directory entries as a massive skel
(like they are today) embedded within the NODE-REVISION skel (in the
nodes table).  (That was back when file contents were also stored in
the NODE-REVISIONs, too.)  When we realized how stupid that was, we
came up with the representations/strings tables, and moved the
contents items into those tables.  Nobody had given much thought to
the efficiency of directory lookups in the skel system -- the idea was
to make things as simple as possible.  Additionally, we thought
deltificaton would earn space saving on directory entries lists, which
tend to be quite compressible.  But since the un-deltification costs
too much in time, we turned off deltification of directory entries
lists -- so we didn't even get the space savings we'd hoped for.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Path lookup table

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-05-30 at 17:08, Branko Čibej wrote:
> >Actually, I think there is a big question.  Unlike a regular filesystem,
> >in Subversion almost all FS usage involves iterating over whole
> >directories.

> How? That doesn't fit.

Checkout, update, switch, diff, status.  They all work by grabbing
entire directories out of the repositories, reporting the files that
have changed, pruning off the subdirs that haven't, and recursing into
the remaining subdirs.  You won't improve the speed of these operations
with a path lookup table; in fact, if the path lookup table is your only
representation of directories, you'll slow them down dramatically (only
by a constant factor, of course, but a big one).

Commits are different.  But the big, time-consuming commits are imports,
which involve repeated operations on a single directory.  We're not very
efficient at doing that right now (in BDB, that is; FSFS is quite speedy
because of the directory-plus-changes representation of directories
within transactions), but we don't need a path lookup table in order to
get O(n) performance for n adds to a directory.

The only case where I can imagine a path lookup table really helping is
when you're doing something like serving a web site of out of svn, and
you have a bunch of big directories, and client accesses to those
directories are pretty random.  That's pretty marginal.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Path lookup table

Posted by Branko Čibej <br...@xbc.nu>.
Greg Hudson wrote:

>On Sun, 2004-05-30 at 16:38, Branko Čibej wrote:
>  
>
>>I think that there's no question about using anything _except_ a table
>>for directory lookups.
>>    
>>
>
>Actually, I think there is a big question.  Unlike a regular filesystem,
>in Subversion almost all FS usage involves iterating over whole
>directories.
>
How? That doesn't fit.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Path lookup table

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-05-30 at 16:38, Branko Čibej wrote:
> I think that there's no question about using anything _except_ a table
> for directory lookups.

Actually, I think there is a big question.  Unlike a regular filesystem,
in Subversion almost all FS usage involves iterating over whole
directories.  With appropriate caching, that's very efficient with our
current representation.

> The only issue is optimizing the representation of directory changes.

I haven't the foggiest how we could make progress on this issue.

> I think we need experience with such a table before we can finalize
> any kind of design, that's why I'm proposing to start this transition
> and experience-gathering now.

As you've pointed out before, the ACL and lock tables are expected to be
sparsely populated, and they won't undergo changes in the same manner as
directories.  Moreover, neither behaves like directories from a
versioning point of view (locks aren't versioned at all, and ACLs don't
become immutable).  So what experience is to be gained here?

A better way to gain experience is to create a new branch (or new FS
module, although that would require a lot of symbol renaming), add the
path lookup table there, and run some performance benchmarks with
real-world data.  That has nothing to do with locks and ACLs, of course.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Path lookup table

Posted by Branko Čibej <br...@xbc.nu>.
Greg Hudson wrote:

>On Fri, 2004-05-28 at 02:51, Branko Čibej wrote:
>  
>
>>    (dir-id, name, [txn-id]) -> info
>>    
>>
>
>[...]
>  
>
>>You'll notice that I didn't spend too much time worrying about how this 
>>design scales from locks/acls -- which are expected to be fairly 
>>"sparse" in the tree -- to a full-blown directory index. There are some 
>>tricky questions to answer; for example, I'm still not sure how to avoid 
>>having to write a zillion index entries every time a directory changes, 
>>either directly or through bubble-up; or how to keep the constant-time 
>>nature of directory copies. It might be possible to even avoid 
>>"physical" bubble-up completely. Anyway, these ruminations need a lot of 
>>evolving first.
>>    
>>
>
>I think it's important to answer these questions before we look at a
>table of this nature as a future direction for FS directory lookups.  I
>suspect the original FS designers did consider using a separate DB
>association for each directory entry, and decided it would be too
>inefficient.
>  
>
I think that there's no question about using anything _except_ a table
for directory lookups. The only issue is optimizing the representation
of directory changes. I think we need experience with such a table
before we can finalize any kind of design, that's why I'm proposing to
start this transition and experience-gathering now.

I admit that my point of view is very (B)DB-centric, so my arguments
don't necessarily hold for FSFS -- for all I know, directory
representation may already be optimal there, so adding a path lookup
table wouldn't bring so much benefit.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Path lookup table (was: Re: Locking design (was Re: svn commit: r9885 - trunk/notes))

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2004-05-28 at 02:51, Branko Čibej wrote:
>     (dir-id, name, [txn-id]) -> info

[...]
> You'll notice that I didn't spend too much time worrying about how this 
> design scales from locks/acls -- which are expected to be fairly 
> "sparse" in the tree -- to a full-blown directory index. There are some 
> tricky questions to answer; for example, I'm still not sure how to avoid 
> having to write a zillion index entries every time a directory changes, 
> either directly or through bubble-up; or how to keep the constant-time 
> nature of directory copies. It might be possible to even avoid 
> "physical" bubble-up completely. Anyway, these ruminations need a lot of 
> evolving first.

I think it's important to answer these questions before we look at a
table of this nature as a future direction for FS directory lookups.  I
suspect the original FS designers did consider using a separate DB
association for each directory entry, and decided it would be too
inefficient.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Path lookup table (was: Re: Locking design (was Re: svn commit: r9885 - trunk/notes))

Posted by Greg Hudson <gh...@MIT.EDU>.
On Fri, 2004-05-28 at 11:06, Travis P wrote:
> On May 28, 2004, at 1:51 AM, Branko Čibej wrote:
> 
> > BDB supports variable-length keys, but table-based databases (which we 
> > want to support in future) are notoriously cranky about rejecting 
> > such, and they're not very efficient in BDB, either. So the structure 
> > of the index would have look like this:
> >
> >    (dir-id, name, [txn-id]) -> info
> 
> I'm not sure what you mean by cranky.

Well, it doesn't matter.  There's a more important reason why we can't
have a lookup table by full pathname: it isn't compatible with O(1)
directory copies.

(We could have a *cache* table indexed by full path, as long as the
cache is populated in a lazy fashion and is of fixed size.  Even then,
directory copies could be perceived as having a non-O(1) performance
impact because of their impact on the cache.  Easier to avoid having any
DB tables indexed by full path.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Path lookup table (was: Re: Locking design (was Re: svn commit: r9885 - trunk/notes))

Posted by Travis P <sv...@castle.fastmail.fm>.
On May 28, 2004, at 1:51 AM, Branko Čibej wrote:

> BDB supports variable-length keys, but table-based databases (which we 
> want to support in future) are notoriously cranky about rejecting 
> such, and they're not very efficient in BDB, either. So the structure 
> of the index would have look like this:
>
>    (dir-id, name, [txn-id]) -> info

I'm not sure what you mean by cranky.  I'm most familiar with 
PosgreSQL.  I believe most DBs support variable length text-type 
fields.  I know PosgreSQL does, and I'd be surprised if DB2, Oracle, 
MySQL didn't.  From the docs:  "In addition, PostgreSQL provides the 
'text' type, which stores strings of any length.  Although the type 
'text' is not in the SQL standard, several other SQL database 
management systems have it as well."  Such fields can be primary keys.  
Given the quality of the query planner, I suggest not writing off 
apparent encoding inefficiencies because you might find that backend 
efficiencies more than make up for it.  (There are also variable-length 
binary types available, though I think there is more variation in them 
between different SQL databases.)

PostgreSQL also supports multi-column indices, which can be efficiently 
used according to this rule:  "The query planner can use a multicolumn 
index for queries that involve the leftmost column in the index 
definition plus any number  of columns listed to the right of it, 
without a gap."  I mention it because it looks like that might be 
particularly useful given your tuple there with an optional third 
member.

I know this doesn't have any immediate impact as there is no SQL db 
backend.  I'm just trying to follow along and was surprised by your 
comment.

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org