You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Brian W. Fitzpatrick" <fi...@collab.net> on 2005/01/04 20:49:00 UTC

Revised Proposal: Improved locking implementation for fsfs

Having had the holiday to ponder the responses to the last proposal, I
began to feel like we were just barking up the wrong tree.  So, based on
a discussion I just had with sussman and cmpilato, I'm going to propose
a radically different solution to this that (I think) falls much more in
line with Peter Lundblad's suggestion that I kept (stupidly)
pooh-poohing as inadequate (Sorry Peter).

So here we go.

There are several limitations that we must overcome when designing
locking in fs_fs:

1. We have to handle UTF8 path names: not all filesystems do this.
2. We have to accomodate case-sensitive/case-preserving filesystems.
3. We have to worry about path length limits (262 bytes in some cases).

Our primary use cases for the locking system are:

0. Lock path FOO.
1. Is path FOO locked?
2. Quickly return all locks BELOW path FOO.

The new design radically changes the locks directory and leaves the
lock-tokens directory fundamentally unchanged.

The locks directory will no longer be a tree mirroring the filesystem
layout, but instead, it will be a single directory (or spread across a
series of 10, 100, or 1000 directories) containing files that will be
named according to a monotonically increasing integer (stored in the
revs equivalent of the db/current file.  

Each file will represent a path component (or "node")--whether the path
component corresponds to a directory or a file.

Each node will contain, on the first line, "FILE" if the node represents
a file, or "DIR" if the node represents a directory.

If the node is a directory, the second through n lines will contain a
serialized hash of entries, where the hash key is the name of the child
component, and the corresponding value is the integer of the filename
where the node's lock is stored.

If the node is a file, the second through n lines will contain a
serialized hash representing the svn_lock_t that is held on the file.

So, for example (using a simplified file format), if we lock the file at
path foo/bar/baz.c, the following files will be created in the locks
directory:

Note that file '0' *must* correspond to the root directory, so it will
be created by default, and it's name will always be '/'.

0:
  "DIR"
  HASH_BEGIN
  "foo"  ->  1
  HASH_END

1:
  "DIR"
  HASH_BEGIN
  "bar" -> 2
  HASH_END

2:
  "FILE"
  <<<lock representation here>>>

This should address all of the wonkiness WRT path lengths and fs
encodings since everything gets encapsulated in these little
"entries-type" files.

Thotz?

Fitz


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Brian W. Fitzpatrick" <fi...@collab.net>.
On Jan 6, 2005, at 6:31 AM, Peter N. Lundblad wrote:

> On Thu, 6 Jan 2005, [UTF-8] Branko Ä^Libej wrote:
>
>> Brian W. Fitzpatrick wrote:
>>
>> In short, you avoid quadratic I/O behaviour here in exactly the same 
>> way
>> as we did in the WC -- by caching the entries files. I'm not sure if
>> this is a valid thing to do in the face of possibly multi-process 
>> access
>> to the FS, though.

Just to be clear, Branko wrote that text

> As long as we think that our current APIs, where each lock/unlock 
> requires
> at least one network round-trip, I think we shouldn't care very much 
> about
> this. Locking 10,000 files in one directory is really a corner case, 
> isn't
> it?

I think so.  I mean, the round-trips alone are going to take forever, 
regardless of our backend implementation.  While you *can* do that with 
Subversion, I sure wouldn't recommend it.

-Fitz

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Brian W. Fitzpatrick" <fi...@collab.net>.
On Jan 6, 2005, at 9:28 AM, John Szakmeister wrote:

> Peter N. Lundblad wrote:
>> On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
>>
>>
>>> Brian W. Fitzpatrick wrote:
>>>
>>> In short, you avoid quadratic I/O behaviour here in exactly the same 
>>> way
>>> as we did in the WC -- by caching the entries files. I'm not sure if
>>> this is a valid thing to do in the face of possibly multi-process 
>>> access
>>> to the FS, though.
>>>
>>
>> As long as we think that our current APIs, where each lock/unlock 
>> requires
>> at least one network round-trip, I think we shouldn't care very much 
>> about
>> this. Locking 10,000 files in one directory is really a corner case, 
>> isn't
>> it?
>
> I'm not sure about that.  I currently work with a customer who has a
> pretty large setup, and they're dying for locking to get into place.
> They've already asked me several times about being able to have locks 
> on
> every file in the repository.  I would not put it past them to hit this
> case... and to do it the first time they get locking up and running.

Well, even if we implement locking as an O(1) operation on the server, 
it's going to take *forever* for your customer to lock and unlock every 
file in the repository.  That is to say that they're going to have 
bigger problems that the implementation of the lock storage mechanism.

-Fitz

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Philip Martin <ph...@codematters.co.uk>.
Ben Collins-Sussman <su...@collab.net> writes:

> I think what they're saying is:  "we'd like to *require* locks on
> every file in the repository before editing", right?  Not that they
> literally want to attach an active lock to every file in the
> repository at the same time.
>
> In other words, when they import a project, the first thing they'll do
> is attach the new 'svn:needs-lock' property to every file.

Once svn:needs-lock is set on all files any global change, an updated
copyright say, will require all the files to be locked.  I think it's
safe to say that some users will set svn:needs-lock on all files and
will then attempt to attach an active lock to every file.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by John Peacock <jp...@rowman.com>.
John Szakmeister wrote:

> They'd like that too, as well as the ability to automatically add
> svn:need-locks to new files. :-)  

Inherited properties anyone?  <duck>

> But no, I think these guys will take
> out a lock on hundreds/thousands of files at once.  They're a bit of an
> edge case, so I wouldn't worry over it.  

Did you mean "head" or "edge" there?

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Boulevard
Suite H
Lanham, MD  20706
301-459-3366 x.5010
fax 301-429-5748

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sun, 9 Jan 2005, Ben Collins-Sussman wrote:

>
> On Jan 9, 2005, at 2:37 PM, Branko Čibej wrote:
> >
> > BTW, can you lock several files with a single DAV request?
> >
> >
>
> I don't think so.  The LOCK request is of the form
>
>     LOCK URI
>     [headers, including authn]
>
>     [body contains lock comment and lock type]
>
>
Ofcourse, this doesn't mean that we can't change the API to take an array.
That would give the svn protocol an(other) advantage performance-wise:-)
It can as well wait if we don't feel it is very important.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Julian Reschke <ju...@gmx.de>.
Branko Čibej wrote:
> Ben Collins-Sussman wrote:
> 
>>
>> On Jan 9, 2005, at 2:37 PM, Branko Čibej wrote:
>>
>>>
>>> BTW, can you lock several files with a single DAV request?
>>>
>>>
>>
>> I don't think so.  The LOCK request is of the form
>>
>>    LOCK URI
>>    [headers, including authn]
>>
>>    [body contains lock comment and lock type]
>>
>>
>> Because LOCK only works on a single URI, the only way to "lock many 
>> things" is to lock a whole collection.  Which we don't support.
> 
> 
> But does the URI have to identify a file (or collection), or can it be a 
> synthetic beastie that identifies a list of resources? In other words, 
> is DAV flexible enough to let you define a synthetic collection that 
> exists only for the lifetime of one request?

Nothing prevents you from doing that in a server, but it would still be 
different from individually locking the files. Keep in mind that a LOCK 
request always creates a single lock, which may happen to affect more 
than one resource (through depth:infinity). In particular, inheritence 
is by URI containment, that is, if you unlock that synthetic collection, 
locks on the indirectly locked resources go away (same for DELETE or MOVE).

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Branko Čibej <br...@xbc.nu>.
Ben Collins-Sussman wrote:

>
> On Jan 9, 2005, at 2:37 PM, Branko Čibej wrote:
>
>>
>> BTW, can you lock several files with a single DAV request?
>>
>>
>
> I don't think so.  The LOCK request is of the form
>
>    LOCK URI
>    [headers, including authn]
>
>    [body contains lock comment and lock type]
>
>
> Because LOCK only works on a single URI, the only way to "lock many 
> things" is to lock a whole collection.  Which we don't support.

But does the URI have to identify a file (or collection), or can it be a 
synthetic beastie that identifies a list of resources? In other words, 
is DAV flexible enough to let you define a synthetic collection that 
exists only for the lifetime of one request?

-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Ben Collins-Sussman <su...@collab.net>.
On Jan 9, 2005, at 2:37 PM, Branko Čibej wrote:
>
> BTW, can you lock several files with a single DAV request?
>
>

I don't think so.  The LOCK request is of the form

    LOCK URI
    [headers, including authn]

    [body contains lock comment and lock type]


Because LOCK only works on a single URI, the only way to "lock many 
things" is to lock a whole collection.  Which we don't support.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:

>On Thu, 6 Jan 2005, John Szakmeister wrote:
>
>  
>
>>They'd like that too, as well as the ability to automatically add
>>svn:need-locks to new files. :-)  But no, I think these guys will take
>>out a lock on hundreds/thousands of files at once.  They're a bit of an
>>edge case, so I wouldn't worry over it.  I just wanted to point out that
>> I know of at least one team that might do such a thing.
>>
>>    
>>
>Just to make it clear. The current proposal for lock implementation in
>fsfs will not have quadratic time complexity for lock, since you don't
>need to rewrite the whole file. Unlock is quadratic. If we add APIs to
>lock/unlock many paths at a time, we can rewrite each directory file once
>without changing the file format. So that will be linear in the number of
>paths then.
>
>BTW, it might be a good idea to have an unlock array of paths function in
>the repos/fs APIs for unlocking commits. But that can wait until after 1.2
>IMO. Then we know if it makes a big difference in practice.
>  
>
I think it would be simpler for the locking API to only work with arrays 
of locks, at least in the RA layer. It's a given that, if every 
lock/unlock causes a network roundtrip, then using an array will be 
visibly faster even if you only operate on two locks at a time. It 
doesn't really make sense to have only the slower method in 1.2 and then 
make people wait until 1.3 for the more optimal version.

BTW, can you lock several files with a single DAV request?

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 6 Jan 2005, John Szakmeister wrote:

> They'd like that too, as well as the ability to automatically add
> svn:need-locks to new files. :-)  But no, I think these guys will take
> out a lock on hundreds/thousands of files at once.  They're a bit of an
> edge case, so I wouldn't worry over it.  I just wanted to point out that
>  I know of at least one team that might do such a thing.
>
Just to make it clear. The current proposal for lock implementation in
fsfs will not have quadratic time complexity for lock, since you don't
need to rewrite the whole file. Unlock is quadratic. If we add APIs to
lock/unlock many paths at a time, we can rewrite each directory file once
without changing the file format. So that will be linear in the number of
paths then.

BTW, it might be a good idea to have an unlock array of paths function in
the repos/fs APIs for unlocking commits. But that can wait until after 1.2
IMO. Then we know if it makes a big difference in practice.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by John Szakmeister <jo...@szakmeister.net>.
Ben Collins-Sussman wrote:
> 
> On Jan 6, 2005, at 9:28 AM, John Szakmeister wrote:
> 
>> Peter N. Lundblad wrote:
>>
>>> On Thu, 6 Jan 2005, [UTF-8] Branko ╴^Libej wrote:
>>>
>>>
>>>> Brian W. Fitzpatrick wrote:
>>>>
>>>> In short, you avoid quadratic I/O behaviour here in exactly the same
>>>> way
>>>> as we did in the WC -- by caching the entries files. I'm not sure if
>>>> this is a valid thing to do in the face of possibly multi-process
>>>> access
>>>> to the FS, though.
>>>>
>>>
>>> As long as we think that our current APIs, where each lock/unlock
>>> requires
>>> at least one network round-trip, I think we shouldn't care very much
>>> about
>>> this. Locking 10,000 files in one directory is really a corner case,
>>> isn't
>>> it?
>>
>>
>> I'm not sure about that.  I currently work with a customer who has a
>> pretty large setup, and they're dying for locking to get into place.
>> They've already asked me several times about being able to have locks on
>> every file in the repository.  I would not put it past them to hit this
>> case... and to do it the first time they get locking up and running.
>>
> 
> I think what they're saying is:  "we'd like to *require* locks on every
> file in the repository before editing", right?  Not that they literally
> want to attach an active lock to every file in the repository at the
> same time.
> 
> In other words, when they import a project, the first thing they'll do
> is attach the new 'svn:needs-lock' property to every file.  Then working
> copies will show every file as read-only all the time, and user will
> have to 'svn lock' to begin edits on any file.

They'd like that too, as well as the ability to automatically add
svn:need-locks to new files. :-)  But no, I think these guys will take
out a lock on hundreds/thousands of files at once.  They're a bit of an
edge case, so I wouldn't worry over it.  I just wanted to point out that
 I know of at least one team that might do such a thing.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Ben Collins-Sussman <su...@collab.net>.
On Jan 6, 2005, at 9:28 AM, John Szakmeister wrote:

> Peter N. Lundblad wrote:
>> On Thu, 6 Jan 2005, [UTF-8] Branko ╴^Libej wrote:
>>
>>
>>> Brian W. Fitzpatrick wrote:
>>>
>>> In short, you avoid quadratic I/O behaviour here in exactly the same 
>>> way
>>> as we did in the WC -- by caching the entries files. I'm not sure if
>>> this is a valid thing to do in the face of possibly multi-process 
>>> access
>>> to the FS, though.
>>>
>>
>> As long as we think that our current APIs, where each lock/unlock 
>> requires
>> at least one network round-trip, I think we shouldn't care very much 
>> about
>> this. Locking 10,000 files in one directory is really a corner case, 
>> isn't
>> it?
>
> I'm not sure about that.  I currently work with a customer who has a
> pretty large setup, and they're dying for locking to get into place.
> They've already asked me several times about being able to have locks 
> on
> every file in the repository.  I would not put it past them to hit this
> case... and to do it the first time they get locking up and running.
>

I think what they're saying is:  "we'd like to *require* locks on every 
file in the repository before editing", right?  Not that they literally 
want to attach an active lock to every file in the repository at the 
same time.

In other words, when they import a project, the first thing they'll do 
is attach the new 'svn:needs-lock' property to every file.  Then 
working copies will show every file as read-only all the time, and user 
will have to 'svn lock' to begin edits on any file.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Ben Collins-Sussman <su...@collab.net>.
On Jan 6, 2005, at 12:18 PM, Julian Reschke wrote:

> Justin Erenkrantz wrote:
>> --On Thursday, January 6, 2005 4:51 PM +0100 "Peter N. Lundblad" 
>> <pe...@famlundblad.se> wrote:
>>> we want to handle locking hundreds of files at the same time
>>> appropriately, we need "lock/unlock these paths" APIs instead. I 
>>> think DAV
>>> will require one round-trip per locked path anyway.
>> A LOCK request with a depth of infinity would do that in one shot.  I 
>> forget whether we're planning to implement that or not.  -- justin
>
> Depends on what one understands at "locking hundreds of files at the 
> same time". A (WebDAV)-LOCK with depth infinity creates one single 
> lock that protects a whole URI namespace. However, it's still *one* 
> lock, that is if you LOCK "a" containing "a/b" and "a/c", you can't 
> remove the lock from "a/c" without removing the lock from all other 
> protected URIs as well.
>

FYI:  we decided not to support locks on collections (directories).  
For now, at least.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Julian Reschke <ju...@gmx.de>.
Justin Erenkrantz wrote:
> --On Thursday, January 6, 2005 4:51 PM +0100 "Peter N. Lundblad" 
> <pe...@famlundblad.se> wrote:
> 
>> we want to handle locking hundreds of files at the same time
>> appropriately, we need "lock/unlock these paths" APIs instead. I think 
>> DAV
>> will require one round-trip per locked path anyway.
> 
> 
> A LOCK request with a depth of infinity would do that in one shot.  I 
> forget whether we're planning to implement that or not.  -- justin

Depends on what one understands at "locking hundreds of files at the 
same time". A (WebDAV)-LOCK with depth infinity creates one single lock 
that protects a whole URI namespace. However, it's still *one* lock, 
that is if you LOCK "a" containing "a/b" and "a/c", you can't remove the 
lock from "a/c" without removing the lock from all other protected URIs 
as well.

Best regards, Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Thursday, January 6, 2005 4:51 PM +0100 "Peter N. Lundblad" 
<pe...@famlundblad.se> wrote:

> we want to handle locking hundreds of files at the same time
> appropriately, we need "lock/unlock these paths" APIs instead. I think DAV
> will require one round-trip per locked path anyway.

A LOCK request with a depth of infinity would do that in one shot.  I 
forget whether we're planning to implement that or not.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 6 Jan 2005, John Szakmeister wrote:

> Peter N. Lundblad wrote:
> > On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
> >
> >
> >>Brian W. Fitzpatrick wrote:
> >>
> >>In short, you avoid quadratic I/O behaviour here in exactly the same way
> >>as we did in the WC -- by caching the entries files. I'm not sure if
> >>this is a valid thing to do in the face of possibly multi-process access
> >>to the FS, though.
> >>
> >
> > As long as we think that our current APIs, where each lock/unlock requires
> > at least one network round-trip, I think we shouldn't care very much about
> > this. Locking 10,000 files in one directory is really a corner case, isn't
> > it?
>
> I'm not sure about that.  I currently work with a customer who has a
> pretty large setup, and they're dying for locking to get into place.
> They've already asked me several times about being able to have locks on
> every file in the repository.  I would not put it past them to hit this
> case... and to do it the first time they get locking up and running.
>
And they want to lock all files at a time? This is more a question about
whether our lock/unlock-taking-one-path-at-a-time API is appropriate. If
we want to handle locking hundreds of files at the same time
appropriately, we need "lock/unlock these paths" APIs instead. I think DAV
will require one round-trip per locked path anyway.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Revised Proposal: Improved locking implementation for fsfs

Posted by John Szakmeister <jo...@szakmeister.net>.
Peter N. Lundblad wrote:
> On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:
> 
> 
>>Brian W. Fitzpatrick wrote:
>>
>>In short, you avoid quadratic I/O behaviour here in exactly the same way
>>as we did in the WC -- by caching the entries files. I'm not sure if
>>this is a valid thing to do in the face of possibly multi-process access
>>to the FS, though.
>>
> 
> As long as we think that our current APIs, where each lock/unlock requires
> at least one network round-trip, I think we shouldn't care very much about
> this. Locking 10,000 files in one directory is really a corner case, isn't
> it?

I'm not sure about that.  I currently work with a customer who has a
pretty large setup, and they're dying for locking to get into place.
They've already asked me several times about being able to have locks on
every file in the repository.  I would not put it past them to hit this
case... and to do it the first time they get locking up and running.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 6 Jan 2005, [UTF-8] Branko �^Libej wrote:

> Brian W. Fitzpatrick wrote:
>
> In short, you avoid quadratic I/O behaviour here in exactly the same way
> as we did in the WC -- by caching the entries files. I'm not sure if
> this is a valid thing to do in the face of possibly multi-process access
> to the FS, though.
>
As long as we think that our current APIs, where each lock/unlock requires
at least one network round-trip, I think we shouldn't care very much about
this. Locking 10,000 files in one directory is really a corner case, isn't
it?

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Branko Čibej <br...@xbc.nu>.
Brian W. Fitzpatrick wrote:

> mbk explained this to me on irc this afternoon, but after further 
> discussion with Peter Lundblad, I think that we're going to go ahead 
> and use MD5 hashes instead of the incrementing integer--mainly because 
> it allows us to do lock lookup with a single stat (instead of opening 
> and reading each entries file for each path component) as well as 
> store only the MD5 for all of the children in a directory node entry 
> file.  So instead of having this complex hash, we have a list of MD5 
> sums to read through.

Sounds good.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Brian W. Fitzpatrick" <fi...@collab.net>.
On Jan 5, 2005, at 8:00 PM, Branko Čibej wrote:

> Brian W. Fitzpatrick wrote:
>
>> On Tue, 2005-01-04 at 15:17, Mark Benedetto King wrote:
>>
>>> On Tue, Jan 04, 2005 at 02:49:00PM -0600, Brian W. Fitzpatrick wrote:
>>>
>>>> If the node is a directory, the second through n lines will contain 
>>>> a
>>>> serialized hash of entries, where the hash key is the name of the 
>>>> child
>>>> component, and the corresponding value is the integer of the 
>>>> filename
>>>> where the node's lock is stored.
>>>>
>>>>
>>> If we consider this data element to be a (somehow delimited) list of
>>> integers, then we can try to limit the size of the individual dirent
>>> files, which should avoid quadratic I/O behaviour (relevant when
>>> locking many files in the same directory).
>>>
>>
>> Can you explain this in more detail please?  I don't understand what
>> you're saying.
>>
>    foo -> 1001
>    bar -> 1002
>    baz -> 1003
>
> trivially maps to
>
>    (foo, bar, baz) -> [1001..1003]
>
> In short, you can compact sequential lists of numbers. The trouble is, 
> of course, that the delimiters are all different, so you can't 
> compress those as in the case of a "normal" list of integers, where, 
> e.g.,
>
>    1,2,3,5,6,7
>
> becomes
>
>    (1..3),(5..7)
>
> Now if we assume that, on average, the length of a file name is about 
> the same as the lenght of a decimally-encoded integer, and you locked 
> all the files in a directory, you'd save around 50% space by 
> compressing the list into a number range. That may seem like a lot, 
> but it's still only a constant factor that vanishes in the noise when 
> the quadratic component begins to make itself felt. Then there's the 
> fact that opening a file is usually more expensive than reading data 
> from it once it's open.
>
> In short, you avoid quadratic I/O behaviour here in exactly the same 
> way as we did in the WC -- by caching the entries files. I'm not sure 
> if this is a valid thing to do in the face of possibly multi-process 
> access to the FS, though.

mbk explained this to me on irc this afternoon, but after further 
discussion with Peter Lundblad, I think that we're going to go ahead 
and use MD5 hashes instead of the incrementing integer--mainly because 
it allows us to do lock lookup with a single stat (instead of opening 
and reading each entries file for each path component) as well as store 
only the MD5 for all of the children in a directory node entry file.  
So instead of having this complex hash, we have a list of MD5 sums to 
read through.

-Fitz

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Branko Čibej <br...@xbc.nu>.
Brian W. Fitzpatrick wrote:

>On Tue, 2005-01-04 at 15:17, Mark Benedetto King wrote:
>  
>
>>On Tue, Jan 04, 2005 at 02:49:00PM -0600, Brian W. Fitzpatrick wrote:
>>    
>>
>>>If the node is a directory, the second through n lines will contain a
>>>serialized hash of entries, where the hash key is the name of the child
>>>component, and the corresponding value is the integer of the filename
>>>where the node's lock is stored.
>>>
>>>      
>>>
>>If we consider this data element to be a (somehow delimited) list of
>>integers, then we can try to limit the size of the individual dirent
>>files, which should avoid quadratic I/O behaviour (relevant when
>>locking many files in the same directory).
>>    
>>
>
>Can you explain this in more detail please?  I don't understand what
>you're saying.
>  
>
    foo -> 1001
    bar -> 1002
    baz -> 1003

trivially maps to

    (foo, bar, baz) -> [1001..1003]

In short, you can compact sequential lists of numbers. The trouble is, 
of course, that the delimiters are all different, so you can't compress 
those as in the case of a "normal" list of integers, where, e.g.,

    1,2,3,5,6,7

becomes

    (1..3),(5..7)

Now if we assume that, on average, the length of a file name is about 
the same as the lenght of a decimally-encoded integer, and you locked 
all the files in a directory, you'd save around 50% space by compressing 
the list into a number range. That may seem like a lot, but it's still 
only a constant factor that vanishes in the noise when the quadratic 
component begins to make itself felt. Then there's the fact that opening 
a file is usually more expensive than reading data from it once it's open.

In short, you avoid quadratic I/O behaviour here in exactly the same way 
as we did in the WC -- by caching the entries files. I'm not sure if 
this is a valid thing to do in the face of possibly multi-process access 
to the FS, though.


-- Brane



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by "Brian W. Fitzpatrick" <fi...@collab.net>.
On Tue, 2005-01-04 at 15:17, Mark Benedetto King wrote:
> On Tue, Jan 04, 2005 at 02:49:00PM -0600, Brian W. Fitzpatrick wrote:
> > 
> > If the node is a directory, the second through n lines will contain a
> > serialized hash of entries, where the hash key is the name of the child
> > component, and the corresponding value is the integer of the filename
> > where the node's lock is stored.
> > 
> 
> If we consider this data element to be a (somehow delimited) list of
> integers, then we can try to limit the size of the individual dirent
> files, which should avoid quadratic I/O behaviour (relevant when
> locking many files in the same directory).

Can you explain this in more detail please?  I don't understand what
you're saying.

-Fitz, the confused


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Revised Proposal: Improved locking implementation for fsfs

Posted by Mark Benedetto King <mb...@lowlatency.com>.
On Tue, Jan 04, 2005 at 02:49:00PM -0600, Brian W. Fitzpatrick wrote:
> 
> If the node is a directory, the second through n lines will contain a
> serialized hash of entries, where the hash key is the name of the child
> component, and the corresponding value is the integer of the filename
> where the node's lock is stored.
> 

If we consider this data element to be a (somehow delimited) list of
integers, then we can try to limit the size of the individual dirent
files, which should avoid quadratic I/O behaviour (relevant when
locking many files in the same directory).

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org