You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by cm...@apache.org on 2010/06/22 17:40:01 UTC

svn commit: r956921 - /subversion/trunk/subversion/libsvn_fs_fs/structure

Author: cmpilato
Date: Tue Jun 22 15:40:00 2010
New Revision: 956921

URL: http://svn.apache.org/viewvc?rev=956921&view=rev
Log:
Correct the FSFS structure documentation for lock storage, which
doesn't appear to match the implementation.

If a repository has a locked file /A/D/G/rho, there will be a
serialized hash file for that path (as an MD5 digest,
".../db/2d9/2d9ce8aaac06331d75dae9dad43473bd", in this example), and
that digest file will be directly referenced from the digest files for
/A/D/G, /A/D, /A and /.  The documentation implies that the digest
file for /A/D/G/rho will only be referenced by a digest file for
/A/D/G (which is then referenced by the digest file for /A/D, which
itself is referenced by the digest for /A, etc.)

* subversion/libsvn_fs_fs/structure
  (Locks layout): Correct the lock discovery algorithm description.

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/structure

Modified: subversion/trunk/subversion/libsvn_fs_fs/structure
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/structure?rev=956921&r1=956920&r2=956921&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/structure (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/structure Tue Jun 22 15:40:00 2010
@@ -500,7 +500,8 @@ names are the first 3 characters of the 
 
 Also stored in the digest file for a given FS path are pointers to
 other digest files which contain information associated with other FS
-paths that are our path's immediate children.
+paths that are beneath our path (an immediate child thereof, or a
+grandchild, or a great-grandchild, ...).
 
 To answer the question, "Does path FOO have a lock associated with
 it?", one need only generate the MD5 digest of FOO's
@@ -515,4 +516,4 @@ To inquire about locks on children of th
 reference the same path as above, but look for a list of children in
 that file (instead of lock information).  Children are listed as MD5
 digests, too, so you would simply iterate over those digests and
-consult the files they reference, and so on, recursively.
+consult the files they reference for lock information.



Re: svn commit: r956921 - /subversion/trunk/subversion/libsvn_fs_fs/structure

Posted by "C. Michael Pilato" <cm...@collab.net>.
C. Michael Pilato wrote:
> Ooh... ouch.  Yes, this is a bug.  The lock setting code goes about things
> one way, and the lock deleting code expects another.  I think it turns out
> okay because the lock system gracefully ignores missing lock files, but I
> further predict that it means that the serialized hash file for, say, the
> root of the repository, will always contain pointers to serialized hash
> files (which may or may not actually exist) for each and every file ever
> locked inside that repository.  For a repository of a million files, that
> could eventually mean a million records -- perhaps all of which are dead --
> in that file.  Forever.
> 
> I'll dig around some to try to confirm this, and get an issue filed.

Tracking this in issue #3660 now:
http://subversion.tigris.org/issues/show_bug.cgi?id=3660

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn commit: r956921 - /subversion/trunk/subversion/libsvn_fs_fs/structure

Posted by "C. Michael Pilato" <cm...@collab.net>.
C. Michael Pilato wrote:
> cmpilato@apache.org wrote:
>> Author: cmpilato
>> Date: Tue Jun 22 15:40:00 2010
>> New Revision: 956921
>>
>> URL: http://svn.apache.org/viewvc?rev=956921&view=rev
>> Log:
>> Correct the FSFS structure documentation for lock storage, which
>> doesn't appear to match the implementation.
>>
>> If a repository has a locked file /A/D/G/rho, there will be a
>> serialized hash file for that path (as an MD5 digest,
>> ".../db/2d9/2d9ce8aaac06331d75dae9dad43473bd", in this example), and
>> that digest file will be directly referenced from the digest files for
>> /A/D/G, /A/D, /A and /.  The documentation implies that the digest
>> file for /A/D/G/rho will only be referenced by a digest file for
>> /A/D/G (which is then referenced by the digest file for /A/D, which
>> itself is referenced by the digest for /A, etc.)
> 
> By the way, I think this was an accident in the implementation.  A reading
> of the code leads you to believe that the original intent was to essentially
> mirror the FS path structure.  I think a single mistake (the failure to
> update a stringbuf_t with a new value on every iteration) resulted in the
> behavior we have today.
> 
> I can't decide if this is a happy accident or a bug we should address.  It
> actually seems to make some of the common queries much faster than they
> would otherwise be, but at the potential cost of disk usage and memory
> consumption.  I mean, in a ginormous repository with 10,000 locked files,
> there's a serialized hash file (or maybe right many of them) with thousands
> of entries in it.  Makes finding those thousands of entries really fast,
> after you've parsed the file and loaded that thousands-of-entries-having
> hash into memory.

Ooh... ouch.  Yes, this is a bug.  The lock setting code goes about things
one way, and the lock deleting code expects another.  I think it turns out
okay because the lock system gracefully ignores missing lock files, but I
further predict that it means that the serialized hash file for, say, the
root of the repository, will always contain pointers to serialized hash
files (which may or may not actually exist) for each and every file ever
locked inside that repository.  For a repository of a million files, that
could eventually mean a million records -- perhaps all of which are dead --
in that file.  Forever.

I'll dig around some to try to confirm this, and get an issue filed.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn commit: r956921 - /subversion/trunk/subversion/libsvn_fs_fs/structure

Posted by "C. Michael Pilato" <cm...@collab.net>.
cmpilato@apache.org wrote:
> Author: cmpilato
> Date: Tue Jun 22 15:40:00 2010
> New Revision: 956921
> 
> URL: http://svn.apache.org/viewvc?rev=956921&view=rev
> Log:
> Correct the FSFS structure documentation for lock storage, which
> doesn't appear to match the implementation.
> 
> If a repository has a locked file /A/D/G/rho, there will be a
> serialized hash file for that path (as an MD5 digest,
> ".../db/2d9/2d9ce8aaac06331d75dae9dad43473bd", in this example), and
> that digest file will be directly referenced from the digest files for
> /A/D/G, /A/D, /A and /.  The documentation implies that the digest
> file for /A/D/G/rho will only be referenced by a digest file for
> /A/D/G (which is then referenced by the digest file for /A/D, which
> itself is referenced by the digest for /A, etc.)

By the way, I think this was an accident in the implementation.  A reading
of the code leads you to believe that the original intent was to essentially
mirror the FS path structure.  I think a single mistake (the failure to
update a stringbuf_t with a new value on every iteration) resulted in the
behavior we have today.

I can't decide if this is a happy accident or a bug we should address.  It
actually seems to make some of the common queries much faster than they
would otherwise be, but at the potential cost of disk usage and memory
consumption.  I mean, in a ginormous repository with 10,000 locked files,
there's a serialized hash file (or maybe right many of them) with thousands
of entries in it.  Makes finding those thousands of entries really fast,
after you've parsed the file and loaded that thousands-of-entries-having
hash into memory.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand