You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Osvaldo Pinali Doederlein <os...@visionnaire.com.br> on 2009/04/24 19:35:23 UTC

RFE: pack revprops shards

I started this RFE in the Subversion blog: "Packing of the /db/revprops 
shards. These are still accumulating hundreds of thousands of TINY files 
(avg 150 bytes) in my poor Windows server (NTFS really doesn't like 
small files)... with packing, each of these 1000 prop files would be 
replaced by a single ~150Kb blob."

Answer from Hyrum Wright: "Revprops are mutable, and as such their size 
may change. Modifying a packed revprop would cause the entire shard to 
be rewritten, not just the modified value. Aside from the performance 
issues, this also causes race conditions when multiple revprops are 
being edited at the same time. All of these concerns mean that packing 
of revprops probably won't happen any time soon. What might happen is a 
migration of revprops to a better storage mechanism, such as sqlite, 
though there are no current plans for that."

Even though revprops packing (with an ideal behavior) is not easy to 
implement, it's still a highly desirable feature so I propose opening a 
bug to track it. I have some suggestions that may even make this viable 
for an 1.6.x update:

- Yes revprops are mutable, but in practice they are mostly-readonly 
data, remarkably for old revisions. It's not uncommon that revprops 
changes be blocked (e.g. SourceForge.net did that until recently). In 
many companies this is also mandated. In such cases, revprops ARE 
read-only, so revprops could be packed trivially (with the same simple 
file layout used by packed revs).
- So, in a first attempt we could have a simple implementation of packed 
revprops, with the following constraint: Once a specific revprops shard 
is packed, any further attempt to change any of those revprops will be 
refused. The admin could use hooks to make the whole thing smoother, 
e.g. making sure that complete shards are only packed if older than a 
month, or disallowing revprops updates even in non-packed revisions so 
the rule is simpler and there are no update failures.
- Additionally, the update of packed revprops could be supported (with 
the same simple storage format) even though it COULD be an expensive 
operation (lock the entire shard or even the entire repo, rewrite it 
completely). It's a reasonable option if, for somebody's repo, revprop 
updates are permitted but not very common. For some users (like my 
company), that don't make heavy use of revprops, those packed shards 
weight in the low hundreds of Kb, so the brute-force update would still 
run in a split-second with NO usability disadvantage at all.
- This initial design wouldn't require a new repository format. We could 
just have an option like "svnadmin pack --revprops", so by default (with 
format=5) only revs are packed, but one could optionally pack the 
revprops too. The fsfs layer would have to detect if a revprops shard is 
packed, but this is necessary anyway (just like with packed revs) 
because the most recent shard is typically not packed. If a future 
version of SVN introduces a smarter storage for packed revprops that can 
handle frequent updates with low overhead and no coarse-grained locking, 
that would be accommodated and distinguished by a different repository 
format.

A+
Osvaldo

-- 
-----------------------------------------------------------------------
Osvaldo Pinali Doederlein                        Visionnaire Virtus S/A
osvaldo@visionnaire.com.br                http://www.visionnaire.com.br
Arquiteto de Tecnologia                         +55 (41) 3337-1000 #226

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897002

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: pack revprops shards

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.
On Apr 27, 2009, at 3:39 AM, Bolstridge, Andrew wrote:

>> -----Original Message-----
>> From: Osvaldo Pinali Doederlein [mailto:osvaldo@visionnaire.com.br]
>> Sent: Friday, April 24, 2009 8:35 PM
>> To: users@subversion.tigris.org
>> Subject: RFE: pack revprops shards
>>
>> I started this RFE in the Subversion blog: "Packing of the
> /db/revprops
>> shards. These are still accumulating hundreds of thousands of TINY
>> files
>> (avg 150 bytes) in my poor Windows server (NTFS really doesn't like
>> small files)... with packing, each of these 1000 prop files would be
>> replaced by a single ~150Kb blob."
>>
>> Answer from Hyrum Wright: "Revprops are mutable, and as such their
> size
>> may change. Modifying a packed revprop would cause the entire shard  
>> to
>> be rewritten, not just the modified value. Aside from the performance
>> issues, this also causes race conditions when multiple revprops are
>> being edited at the same time. All of these concerns mean that  
>> packing
>> of revprops probably won't happen any time soon. What might happen is
> a
>> migration of revprops to a better storage mechanism, such as sqlite,
>> though there are no current plans for that."
>>
>
> The answer to that is to load the shard as a memory-mapped file. Then
> you update the only appropriate revprop. If you're concerned about
> inserting data into the middle of a shard, when packing revprops  
> leave a
> chunk of space after each one. Then you can write into the gap. If the
> amount of data would overflow the gap, then you'd have to fall back  
> to a
> full rewrite of the entire shard.

Not a bad solution.  We'd have to determine a "best guess" value for  
the extra space, so we don't end up negating the space savings with  
the padding.  Also, we'd have to store both an offset and length for  
each revprop, instead of just the offset (which for revision data  
implies the length).  We'd also need to keep a fallback mechanism for  
revprops which overflow their assigned buffers.

And this approach *still* has us manually managing on-disk formats and  
such.  I'd like to delegate that to competent libraries, such as sqlite.

> The only question now is - would packing revprops increase performance
> much? I guess they do get read a lot for operations like log, list  
> etc.
> I'd say they get read a lot more than past revisions do, so the
> performance increase might (*might*) be noticeable.

You'd be surprised how much the revision data (not props) are used,  
particularly when reconstructing full-texts using deltas.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1946728

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

RE: pack revprops shards

Posted by "Bolstridge, Andrew" <an...@intergraph.com>.
> -----Original Message-----
> From: Osvaldo Pinali Doederlein [mailto:osvaldo@visionnaire.com.br]
> Sent: Friday, April 24, 2009 8:35 PM
> To: users@subversion.tigris.org
> Subject: RFE: pack revprops shards
> 
> I started this RFE in the Subversion blog: "Packing of the
/db/revprops
> shards. These are still accumulating hundreds of thousands of TINY
> files
> (avg 150 bytes) in my poor Windows server (NTFS really doesn't like
> small files)... with packing, each of these 1000 prop files would be
> replaced by a single ~150Kb blob."
> 
> Answer from Hyrum Wright: "Revprops are mutable, and as such their
size
> may change. Modifying a packed revprop would cause the entire shard to
> be rewritten, not just the modified value. Aside from the performance
> issues, this also causes race conditions when multiple revprops are
> being edited at the same time. All of these concerns mean that packing
> of revprops probably won't happen any time soon. What might happen is
a
> migration of revprops to a better storage mechanism, such as sqlite,
> though there are no current plans for that."
> 

The answer to that is to load the shard as a memory-mapped file. Then
you update the only appropriate revprop. If you're concerned about
inserting data into the middle of a shard, when packing revprops leave a
chunk of space after each one. Then you can write into the gap. If the
amount of data would overflow the gap, then you'd have to fall back to a
full rewrite of the entire shard.

The only question now is - would packing revprops increase performance
much? I guess they do get read a lot for operations like log, list etc.
I'd say they get read a lot more than past revisions do, so the
performance increase might (*might*) be noticeable.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1942136

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].


Re: RFE: pack revprops shards

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.
On Apr 24, 2009, at 3:55 PM, Blair Zajac wrote:

> Hyrum K. Wright wrote:
>> Just playing devil's advocate on this issue.  I'm not against it  
>> by  any means, but have a few questions.
>> On Apr 24, 2009, at 3:11 PM, Blair Zajac wrote:
>>> One could also have the revprops stored as single files if they  
>>> are  modified
>>> after packing.  The lookup code would first try to open the  
>>> single  revprop file
>>> and if it doesn't exist, then it goes to the packaged file.  The   
>>> packer could
>>> allow for multiple repackings.
>> In repositories which have a large number of modified-then-packed   
>> revprops, this leads to much more storage, and double the I/O.   
>> I'm  kinda wary of this.
>
> Not that this is the best solution, but I don't think you can say  
> this solution is bad for IO and then claim the sqlite one is a good  
> solution.  sqlite probably does much more IO than one extra open()  
> call that will fail for 95% of the unmodified revprops for this  
> solution.  I like the sqlite solution, I just don't think it wins on  
> IO.

Well, I'd much rather leave the I/O optimization and concurrency  
handling to a third-party library, which can arguably worry about  
these issues better than we can.  It may not be optimal, and it may  
not do less I/O than our current solution, but I think it offers a  
good trade-off between I/O, space, and other factors.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1898348

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Blair Zajac <bl...@orcaware.com>.
Hyrum K. Wright wrote:
> Just playing devil's advocate on this issue.  I'm not against it by  
> any means, but have a few questions.
> 
> On Apr 24, 2009, at 3:11 PM, Blair Zajac wrote:
> 
>> One could also have the revprops stored as single files if they are  
>> modified
>> after packing.  The lookup code would first try to open the single  
>> revprop file
>> and if it doesn't exist, then it goes to the packaged file.  The  
>> packer could
>> allow for multiple repackings.
> 
> In repositories which have a large number of modified-then-packed  
> revprops, this leads to much more storage, and double the I/O.  I'm  
> kinda wary of this.

Not that this is the best solution, but I don't think you can say this solution 
is bad for IO and then claim the sqlite one is a good solution.  sqlite probably 
does much more IO than one extra open() call that will fail for 95% of the 
unmodified revprops for this solution.  I like the sqlite solution, I just don't 
think it wins on IO.

Blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897561

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Blair Zajac <bl...@orcaware.com>.
Osvaldo Pinali Doederlein wrote:
> On 24/04/2009 17:58, Blair Zajac wrote:
>>> Well originally I was thinking that it would be a very reasonable
>>> tradeoff to disable revprop editing on packed revisions.  I think the
>>> majority of times this is used is to fix a log message, and that
>>> usually happens soon after the commit.  It then occurred to me there
>>> are tools like svnsync and svk that use revprops.  I think they
>>> usually use revision 0, so that would be a pretty minor special case.
>> We've gone back in our own svn repos and modified very old commit 
>> messages, so I wouldn't be too happy if I couldn't modify it.
>>
>> Yet, on the other hand, git doesn't allow any log message modification 
>> at all once the commit gets into a public repos as the log message 
>> becomes part of the sha1sum, so I don't know how they deal with that.
>>
> IMHO (as a simple SVN user and admin, not developer), allowing changes 
> to anything in past revisions is bad practice. Reduces the auditing 
> utility of the repository, for one thing. And I guess it could make some 
> features more complex and less efficient, like replication. In my repos 
> I use a pre-commit hook to disallow commits with empty messages, this 
> avoids the most common problem of hitting the 'Commit' button in a GUI 
> like Subclipse, forgetting to type the message... for other mistakes 
> like typos, the error will just live forever in the history, eternal 
> shame on the user, and that's it. Of course this is just my company 
> policy but I suspect it's not very uncommon, that's why I made this RFE 
> assuming that a partial solution of packing w/o support for posterior 
> revprop updates would be a good compromise. Of course, if the SQLite 
> solution is both easy and enables more advanced features, that won't 
> make me unhappy. ;-)

Well, there's times when people attach the completely wrong log message to a 
commit, which would be nice to fix.  A typo or other changes could be skipped 
fixing.

Blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1900751

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Osvaldo Pinali Doederlein <os...@visionnaire.com.br>.
On 24/04/2009 17:58, Blair Zajac wrote:
>> Well originally I was thinking that it would be a very reasonable
>> tradeoff to disable revprop editing on packed revisions.  I think the
>> majority of times this is used is to fix a log message, and that
>> usually happens soon after the commit.  It then occurred to me there
>> are tools like svnsync and svk that use revprops.  I think they
>> usually use revision 0, so that would be a pretty minor special case.
> We've gone back in our own svn repos and modified very old commit 
> messages, so I wouldn't be too happy if I couldn't modify it.
>
> Yet, on the other hand, git doesn't allow any log message modification 
> at all once the commit gets into a public repos as the log message 
> becomes part of the sha1sum, so I don't know how they deal with that.
>
IMHO (as a simple SVN user and admin, not developer), allowing changes 
to anything in past revisions is bad practice. Reduces the auditing 
utility of the repository, for one thing. And I guess it could make some 
features more complex and less efficient, like replication. In my repos 
I use a pre-commit hook to disallow commits with empty messages, this 
avoids the most common problem of hitting the 'Commit' button in a GUI 
like Subclipse, forgetting to type the message... for other mistakes 
like typos, the error will just live forever in the history, eternal 
shame on the user, and that's it. Of course this is just my company 
policy but I suspect it's not very uncommon, that's why I made this RFE 
assuming that a partial solution of packing w/o support for posterior 
revprop updates would be a good compromise. Of course, if the SQLite 
solution is both easy and enables more advanced features, that won't 
make me unhappy. ;-)

A+
Osvaldo

-- 
-----------------------------------------------------------------------
Osvaldo Pinali Doederlein                        Visionnaire Virtus S/A
osvaldo@visionnaire.com.br                http://www.visionnaire.com.br
Arquiteto de Tecnologia                         +55 (41) 3337-1000 #226

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1900536

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Blair Zajac <bl...@orcaware.com>.
Mark Phippard wrote:
> On Fri, Apr 24, 2009 at 4:58 PM, Blair Zajac <bl...@orcaware.com> wrote:
> 
>>> Well originally I was thinking that it would be a very reasonable
>>> tradeoff to disable revprop editing on packed revisions.  I think the
>>> majority of times this is used is to fix a log message, and that
>>> usually happens soon after the commit.  It then occurred to me there
>>> are tools like svnsync and svk that use revprops.  I think they
>>> usually use revision 0, so that would be a pretty minor special case.
>> We've gone back in our own svn repos and modified very old commit messages,
>> so I wouldn't be too happy if I couldn't modify it.
> 
> Nothing would be forcing you to pack the repository and if the
> tradeoff is known, where is the loss?

The loss is if you do want both features.

I don't like backing up 190,000 revprop files on our repository, but I would 
like to change them also.

Blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1900654

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Apr 24, 2009 at 4:58 PM, Blair Zajac <bl...@orcaware.com> wrote:

>> Well originally I was thinking that it would be a very reasonable
>> tradeoff to disable revprop editing on packed revisions.  I think the
>> majority of times this is used is to fix a log message, and that
>> usually happens soon after the commit.  It then occurred to me there
>> are tools like svnsync and svk that use revprops.  I think they
>> usually use revision 0, so that would be a pretty minor special case.
>
> We've gone back in our own svn repos and modified very old commit messages,
> so I wouldn't be too happy if I couldn't modify it.

Nothing would be forcing you to pack the repository and if the
tradeoff is known, where is the loss?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897605

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].


Re: RFE: pack revprops shards

Posted by Blair Zajac <bl...@orcaware.com>.
Mark Phippard wrote:
> On Fri, Apr 24, 2009 at 4:44 PM, Hyrum K. Wright
> <hy...@mail.utexas.edu> wrote:
> 
>>>> 2) Allow packing of revprops, but issue an error if an attempt is made
>>>> to edit a packed revision.  This seems like a pretty small need.
>>>> Perhaps rev 0 could always be unpacked since there are some tools that
>>>> store things in the revprops of rev 0.
>> That's a possibility; it'd be kinda like a permanently disabled pre-revprop
>> hook.  However, I can see the complaints from folks who take this step and
>> then want to disable it.  Also, the idea of special casing a particular
>> revision raises red flags in my sense-o-meter.
> 
> Well originally I was thinking that it would be a very reasonable
> tradeoff to disable revprop editing on packed revisions.  I think the
> majority of times this is used is to fix a log message, and that
> usually happens soon after the commit.  It then occurred to me there
> are tools like svnsync and svk that use revprops.  I think they
> usually use revision 0, so that would be a pretty minor special case.

We've gone back in our own svn repos and modified very old commit messages, so I 
wouldn't be too happy if I couldn't modify it.

Yet, on the other hand, git doesn't allow any log message modification at all 
once the commit gets into a public repos as the log message becomes part of the 
sha1sum, so I don't know how they deal with that.

Blair

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897599

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Apr 24, 2009 at 4:44 PM, Hyrum K. Wright
<hy...@mail.utexas.edu> wrote:

>>> 2) Allow packing of revprops, but issue an error if an attempt is made
>>> to edit a packed revision.  This seems like a pretty small need.
>>> Perhaps rev 0 could always be unpacked since there are some tools that
>>> store things in the revprops of rev 0.
>
> That's a possibility; it'd be kinda like a permanently disabled pre-revprop
> hook.  However, I can see the complaints from folks who take this step and
> then want to disable it.  Also, the idea of special casing a particular
> revision raises red flags in my sense-o-meter.

Well originally I was thinking that it would be a very reasonable
tradeoff to disable revprop editing on packed revisions.  I think the
majority of times this is used is to fix a log message, and that
usually happens soon after the commit.  It then occurred to me there
are tools like svnsync and svk that use revprops.  I think they
usually use revision 0, so that would be a pretty minor special case.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897540

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].


Re: RFE: pack revprops shards

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.
Just playing devil's advocate on this issue.  I'm not against it by  
any means, but have a few questions.

On Apr 24, 2009, at 3:11 PM, Blair Zajac wrote:

> One could also have the revprops stored as single files if they are  
> modified
> after packing.  The lookup code would first try to open the single  
> revprop file
> and if it doesn't exist, then it goes to the packaged file.  The  
> packer could
> allow for multiple repackings.

In repositories which have a large number of modified-then-packed  
revprops, this leads to much more storage, and double the I/O.  I'm  
kinda wary of this.

> Mark Phippard wrote:
>> I see a couple of options:
>>
>> 1) Start storing revprops in SQLite.  We are already using it for
>> rep-sharing.  This removes the need to pack the revprops and possibly
>> even opens the door for future features users have asked for, such as
>> being able to query revprops.

I think this is the best long-term solution.  The implementation  
should be pretty straight forward, it's just a matter of somebody  
picking it up.

>> 2) Allow packing of revprops, but issue an error if an attempt is  
>> made
>> to edit a packed revision.  This seems like a pretty small need.
>> Perhaps rev 0 could always be unpacked since there are some tools  
>> that
>> store things in the revprops of rev 0.

That's a possibility; it'd be kinda like a permanently disabled pre- 
revprop hook.  However, I can see the complaints from folks who take  
this step and then want to disable it.  Also, the idea of special  
casing a particular revision raises red flags in my sense-o-meter.

>> On Fri, Apr 24, 2009 at 3:35 PM, Osvaldo Pinali Doederlein
>> <os...@visionnaire.com.br> wrote:
>>> I started this RFE in the Subversion blog: "Packing of the /db/ 
>>> revprops
>>> shards. These are still accumulating hundreds of thousands of TINY  
>>> files
>>> (avg 150 bytes) in my poor Windows server (NTFS really doesn't like
>>> small files)... with packing, each of these 1000 prop files would be
>>> replaced by a single ~150Kb blob."
>>>
>>> Answer from Hyrum Wright: "Revprops are mutable, and as such their  
>>> size
>>> may change. Modifying a packed revprop would cause the entire  
>>> shard to
>>> be rewritten, not just the modified value. Aside from the  
>>> performance
>>> issues, this also causes race conditions when multiple revprops are
>>> being edited at the same time. All of these concerns mean that  
>>> packing
>>> of revprops probably won't happen any time soon. What might happen  
>>> is a
>>> migration of revprops to a better storage mechanism, such as sqlite,
>>> though there are no current plans for that."
>>>
>>> Even though revprops packing (with an ideal behavior) is not easy to
>>> implement, it's still a highly desirable feature so I propose  
>>> opening a
>>> bug to track it. I have some suggestions that may even make this  
>>> viable
>>> for an 1.6.x update:
>>>
>>> - Yes revprops are mutable, but in practice they are mostly-readonly
>>> data, remarkably for old revisions. It's not uncommon that revprops
>>> changes be blocked (e.g. SourceForge.net did that until recently).  
>>> In
>>> many companies this is also mandated. In such cases, revprops ARE
>>> read-only, so revprops could be packed trivially (with the same  
>>> simple
>>> file layout used by packed revs).
>>> - So, in a first attempt we could have a simple implementation of  
>>> packed
>>> revprops, with the following constraint: Once a specific revprops  
>>> shard
>>> is packed, any further attempt to change any of those revprops  
>>> will be
>>> refused. The admin could use hooks to make the whole thing smoother,
>>> e.g. making sure that complete shards are only packed if older  
>>> than a
>>> month, or disallowing revprops updates even in non-packed  
>>> revisions so
>>> the rule is simpler and there are no update failures.
>>> - Additionally, the update of packed revprops could be supported  
>>> (with
>>> the same simple storage format) even though it COULD be an expensive
>>> operation (lock the entire shard or even the entire repo, rewrite it
>>> completely). It's a reasonable option if, for somebody's repo,  
>>> revprop
>>> updates are permitted but not very common. For some users (like my
>>> company), that don't make heavy use of revprops, those packed shards
>>> weight in the low hundreds of Kb, so the brute-force update would  
>>> still
>>> run in a split-second with NO usability disadvantage at all.
>>> - This initial design wouldn't require a new repository format. We  
>>> could
>>> just have an option like "svnadmin pack --revprops", so by default  
>>> (with
>>> format=5) only revs are packed, but one could optionally pack the
>>> revprops too. The fsfs layer would have to detect if a revprops  
>>> shard is
>>> packed, but this is necessary anyway (just like with packed revs)
>>> because the most recent shard is typically not packed. If a future
>>> version of SVN introduces a smarter storage for packed revprops  
>>> that can
>>> handle frequent updates with low overhead and no coarse-grained  
>>> locking,
>>> that would be accommodated and distinguished by a different  
>>> repository
>>> format.
>>>
>>> A+
>>> Osvaldo
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897254
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org 
> ].

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897503

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Blair Zajac <bl...@orcaware.com>.
One could also have the revprops stored as single files if they are modified 
after packing.  The lookup code would first try to open the single revprop file 
and if it doesn't exist, then it goes to the packaged file.  The packer could 
allow for multiple repackings.

Blair

Mark Phippard wrote:
> I see a couple of options:
> 
> 1) Start storing revprops in SQLite.  We are already using it for
> rep-sharing.  This removes the need to pack the revprops and possibly
> even opens the door for future features users have asked for, such as
> being able to query revprops.
> 
> 2) Allow packing of revprops, but issue an error if an attempt is made
> to edit a packed revision.  This seems like a pretty small need.
> Perhaps rev 0 could always be unpacked since there are some tools that
> store things in the revprops of rev 0.
> 
> 
> 
> On Fri, Apr 24, 2009 at 3:35 PM, Osvaldo Pinali Doederlein
> <os...@visionnaire.com.br> wrote:
>> I started this RFE in the Subversion blog: "Packing of the /db/revprops
>> shards. These are still accumulating hundreds of thousands of TINY files
>> (avg 150 bytes) in my poor Windows server (NTFS really doesn't like
>> small files)... with packing, each of these 1000 prop files would be
>> replaced by a single ~150Kb blob."
>>
>> Answer from Hyrum Wright: "Revprops are mutable, and as such their size
>> may change. Modifying a packed revprop would cause the entire shard to
>> be rewritten, not just the modified value. Aside from the performance
>> issues, this also causes race conditions when multiple revprops are
>> being edited at the same time. All of these concerns mean that packing
>> of revprops probably won't happen any time soon. What might happen is a
>> migration of revprops to a better storage mechanism, such as sqlite,
>> though there are no current plans for that."
>>
>> Even though revprops packing (with an ideal behavior) is not easy to
>> implement, it's still a highly desirable feature so I propose opening a
>> bug to track it. I have some suggestions that may even make this viable
>> for an 1.6.x update:
>>
>> - Yes revprops are mutable, but in practice they are mostly-readonly
>> data, remarkably for old revisions. It's not uncommon that revprops
>> changes be blocked (e.g. SourceForge.net did that until recently). In
>> many companies this is also mandated. In such cases, revprops ARE
>> read-only, so revprops could be packed trivially (with the same simple
>> file layout used by packed revs).
>> - So, in a first attempt we could have a simple implementation of packed
>> revprops, with the following constraint: Once a specific revprops shard
>> is packed, any further attempt to change any of those revprops will be
>> refused. The admin could use hooks to make the whole thing smoother,
>> e.g. making sure that complete shards are only packed if older than a
>> month, or disallowing revprops updates even in non-packed revisions so
>> the rule is simpler and there are no update failures.
>> - Additionally, the update of packed revprops could be supported (with
>> the same simple storage format) even though it COULD be an expensive
>> operation (lock the entire shard or even the entire repo, rewrite it
>> completely). It's a reasonable option if, for somebody's repo, revprop
>> updates are permitted but not very common. For some users (like my
>> company), that don't make heavy use of revprops, those packed shards
>> weight in the low hundreds of Kb, so the brute-force update would still
>> run in a split-second with NO usability disadvantage at all.
>> - This initial design wouldn't require a new repository format. We could
>> just have an option like "svnadmin pack --revprops", so by default (with
>> format=5) only revs are packed, but one could optionally pack the
>> revprops too. The fsfs layer would have to detect if a revprops shard is
>> packed, but this is necessary anyway (just like with packed revs)
>> because the most recent shard is typically not packed. If a future
>> version of SVN introduces a smarter storage for packed revprops that can
>> handle frequent updates with low overhead and no coarse-grained locking,
>> that would be accommodated and distinguished by a different repository
>> format.
>>
>> A+
>> Osvaldo

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897254

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: RFE: pack revprops shards

Posted by Mark Phippard <ma...@gmail.com>.
I see a couple of options:

1) Start storing revprops in SQLite.  We are already using it for
rep-sharing.  This removes the need to pack the revprops and possibly
even opens the door for future features users have asked for, such as
being able to query revprops.

2) Allow packing of revprops, but issue an error if an attempt is made
to edit a packed revision.  This seems like a pretty small need.
Perhaps rev 0 could always be unpacked since there are some tools that
store things in the revprops of rev 0.



On Fri, Apr 24, 2009 at 3:35 PM, Osvaldo Pinali Doederlein
<os...@visionnaire.com.br> wrote:
> I started this RFE in the Subversion blog: "Packing of the /db/revprops
> shards. These are still accumulating hundreds of thousands of TINY files
> (avg 150 bytes) in my poor Windows server (NTFS really doesn't like
> small files)... with packing, each of these 1000 prop files would be
> replaced by a single ~150Kb blob."
>
> Answer from Hyrum Wright: "Revprops are mutable, and as such their size
> may change. Modifying a packed revprop would cause the entire shard to
> be rewritten, not just the modified value. Aside from the performance
> issues, this also causes race conditions when multiple revprops are
> being edited at the same time. All of these concerns mean that packing
> of revprops probably won't happen any time soon. What might happen is a
> migration of revprops to a better storage mechanism, such as sqlite,
> though there are no current plans for that."
>
> Even though revprops packing (with an ideal behavior) is not easy to
> implement, it's still a highly desirable feature so I propose opening a
> bug to track it. I have some suggestions that may even make this viable
> for an 1.6.x update:
>
> - Yes revprops are mutable, but in practice they are mostly-readonly
> data, remarkably for old revisions. It's not uncommon that revprops
> changes be blocked (e.g. SourceForge.net did that until recently). In
> many companies this is also mandated. In such cases, revprops ARE
> read-only, so revprops could be packed trivially (with the same simple
> file layout used by packed revs).
> - So, in a first attempt we could have a simple implementation of packed
> revprops, with the following constraint: Once a specific revprops shard
> is packed, any further attempt to change any of those revprops will be
> refused. The admin could use hooks to make the whole thing smoother,
> e.g. making sure that complete shards are only packed if older than a
> month, or disallowing revprops updates even in non-packed revisions so
> the rule is simpler and there are no update failures.
> - Additionally, the update of packed revprops could be supported (with
> the same simple storage format) even though it COULD be an expensive
> operation (lock the entire shard or even the entire repo, rewrite it
> completely). It's a reasonable option if, for somebody's repo, revprop
> updates are permitted but not very common. For some users (like my
> company), that don't make heavy use of revprops, those packed shards
> weight in the low hundreds of Kb, so the brute-force update would still
> run in a split-second with NO usability disadvantage at all.
> - This initial design wouldn't require a new repository format. We could
> just have an option like "svnadmin pack --revprops", so by default (with
> format=5) only revs are packed, but one could optionally pack the
> revprops too. The fsfs layer would have to detect if a revprops shard is
> packed, but this is necessary anyway (just like with packed revs)
> because the most recent shard is typically not packed. If a future
> version of SVN introduces a smarter storage for packed revprops that can
> handle frequent updates with low overhead and no coarse-grained locking,
> that would be accommodated and distinguished by a different repository
> format.
>
> A+
> Osvaldo
>
> --
> -----------------------------------------------------------------------
> Osvaldo Pinali Doederlein                        Visionnaire Virtus S/A
> osvaldo@visionnaire.com.br                http://www.visionnaire.com.br
> Arquiteto de Tecnologia                         +55 (41) 3337-1000 #226
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897002
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
>



-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1897239

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].