You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Niels Werensteijn <n....@student.utwente.nl> on 2008/01/22 11:51:19 UTC

[PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Summary so far:

New files, or new chunks of files have no source stream (chunk) to 
compare it to, in order to make a diff of it. At this moment, vdelta 
routine is used on these streams. This takes a lot of cpu power. Even 
worse, since the diffs are compressed with zlib later on (both on disk 
and during transmission), vdelta is actualy making it harder for zlib to 
compress the stream, resulting in larger streams.

This patch:
In this patch I replace the call to vdelta with a single call to 
svn_txdelta__insert_op and just insert the whole chunk as new data. This 
costs almost no cpu time, and lets zlib compress it better.

My test results on two repositories confirmed that the on disk size 
shrunk a little bit, and that cpu time was greatly reduced.

My time results:
Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds

[[[
Replace vdelta with a single insert command

When creating delta's where the source stream is 0 length,
replace a call to the vdelta routine with the construction
of a single insert command, inserting the whole chunk. This
saves cpu time, and allows compression routines that work on
the delta streams to compress better.

* subversion/libsvn_delta/text_delta.c
    (compute_window): replace call to svn_txdelta__vdelta with
    a call to svn_txdelta__insert_op
]]]


Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by David Glasser <gl...@davidglasser.net>.
2008/1/22 Talden <ta...@gmail.com>:
> Is there a compatibility policy in place for Subversion to set users
> expectations for the obsolesence of certain combinations of
> repository, server and client?

http://subversion.tigris.org/hacking.html#release-numbering

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op

Posted by Branko Čibej <br...@xbc.nu>.
Talden wrote:
> Is there a compatibility policy in place for Subversion to set users
> expectations for the obsolesence of certain combinations of
> repository, server and client?
>   

Certainly there is. 
http://subversion.tigris.org/hacking.html#release-numbering

[snip]

Note that I don't propose ditching svndiff0, or rendering older clients 
incompatible with newer servers. The effect of completely removing 
vdelta would be that in some situations, clients that don't support 
svndiff1 would appear bit slower when talking to a new enough server, 
whilst newer clients that do support svndiff1 would appear faster. Such 
a trade-off is completely acceptable, IMHO.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by Talden <ta...@gmail.com>.
Is there a compatibility policy in place for Subversion to set users
expectations for the obsolesence of certain combinations of
repository, server and client?

The project really should have a formal expression of the supported
version combinations - not just in terms of what the current release
supports but what the Nth release down the road will support.  This
allows much more freedom to designers to axe obsoleted code and,
considering that upgrading large repositories can be a significant
undertaking, provides a clear planning path for users/support staff.

Perpetual backwards compatibility (including client to server
compatibility) comes with a pretty hefty cost that hampers a project's
ability to move forward competitively (consider the unpleasant recent
discussions about Java and the general determination to ensure
perpetual backwards compatibility).

--
Talden


On Jan 23, 2008 6:07 AM, Branko Čibej <br...@xbc.nu> wrote:
>
> David Glasser wrote:
> > 2008/1/22 Niels Werensteijn <n....@student.utwente.nl>:
> >
> >> Summary so far:
> >>
> >> New files, or new chunks of files have no source stream (chunk) to
> >> compare it to, in order to make a diff of it. At this moment, vdelta
> >> routine is used on these streams. This takes a lot of cpu power. Even
> >> worse, since the diffs are compressed with zlib later on (both on disk
> >> and during transmission), vdelta is actualy making it harder for zlib to
> >> compress the stream, resulting in larger streams.
> >>
> >> This patch:
> >> In this patch I replace the call to vdelta with a single call to
> >> svn_txdelta__insert_op and just insert the whole chunk as new data. This
> >> costs almost no cpu time, and lets zlib compress it better.
> >>
> >> My test results on two repositories confirmed that the on disk size
> >> shrunk a little bit, and that cpu time was greatly reduced.
> >>
> >> My time results:
> >> Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
> >> Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds
> >>
> >
> > This seems like a reasonable idea, but note that we don't always send
> > txdeltas compressed.  Specifically, only the serialization of txdeltas
> > called "svndiff1" is compressed, not "svndiff0".  Some examples of
> > when each can be used:
> >
> > * In FSFS repositories, svndiff1 is only used in repositories of DB
> > format 2 and above.
> >
> > * In the svnserve protocol, svndiff1 is only used if the client and
> > server both support it (and declare so in their header).
> >
> > I'm not sure what controls the use of svndiff1 in the DAV protocol and
> > BDB repositories off the top of my head.
> >
> > My guess (not based on actual experimentation) is that your patch
> > would be a performance regression for cases that svndiff0 is
> > transmitted.  If that's the case (can you test it?  One way is to
> > create an FSFS repository with --pre-1.4.x-compatible and compare
> > sizes), we'd probably want to rev a bunch of functions to add a
> > boolean flag "output_will_be_compressed" or something which gets
> > passed all the way down to your one-line change.
> >
>
> Bah. --pre-1.4.x-compatible is for old clients accessing via file://.
> svndiff0 over the wire is for old clients, period. I think burning that
> bit all the way through our APIs is more hassle than upgrading clients.
>
> -- Brane
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by Niels Werensteijn <n....@student.utwente.nl>.
Branko Čibej schreef:
> David Glasser wrote:
>> On Jan 22, 2008 9:07 AM, Branko Čibej <br...@xbc.nu> wrote:
>>  
>>> Bah. --pre-1.4.x-compatible is for old clients accessing via file://.
>>> svndiff0 over the wire is for old clients, period. I think burning that
>>> bit all the way through our APIs is more hassle than upgrading clients.
>>>     
>>
>> I'll buy that for the FS; although it's also for "old repositories who
>> haven't done a dump and load, the only officially supported way of
>> upgrading repository format", I suspect that 1.5 will require an
>> "svnadmin upgrade" command anyway.
>>   
> 
> IMHO the inconvenience of forcing an upgrade of clients that access the 
> repository locally is mitigated by the fact that they /do/ access the 
> repository locally, in other words, it's quite likely that you'll have 
> complete control over the client version when you upgrade the server and 
> the repository.
> 
>> Do we have actual numbers on client versions anywhere?  It wasn't long
>> ago that it was uncommon to find 1.4 in stable versions of various
>> distributions.
>>   
> 
> I don't think that's so important. Remember that we never send actual 
> data from the repo to the client; it's always munged and deltas 
> recomputed in some way. So this over-the-net slowdown would only affect 
> pre-1.4 clients talking to a 1.5 server during initial checkout, when we 
> can't send deltas.
And, I suspect, new file checkins from the new clients to the old 
servers. But yes, the point would be that this happens relatively 
rarely, mostly during inital checkout, and export I think.

I am not in favour of updating the API. It is a big hassle and like you 
said before, it is "only" a performance issue, not a compatibility 
issue. But then, who am I to say? :)

Regards,
Niels

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op

Posted by Branko Čibej <br...@xbc.nu>.
David Glasser wrote:
> On Jan 24, 2008 3:26 AM, Branko Čibej <br...@xbc.nu> wrote:
>   
>> I don't think that's so important. Remember that we never send actual
>> data from the repo to the client; it's always munged and deltas
>> recomputed in some way. So this over-the-net slowdown would only affect
>> pre-1.4 clients talking to a 1.5 server during initial checkout, when we
>> can't send deltas.
>>     
>
> Um, what do you mean "when we can't send deltas"?

We send self-compressed files, not deltas between two files, during 
initial checkout and export. So we do agree on semantics, if not on 
terminology. :)

>   The whole point
> here was that even the original checkout of a file may be computed as
> a delta against the empty stream.  (Unless there's some special-case
> code somewhere to not do this for the first checkout, but I don't
> think so...)
>
> That said, I suspect your analysis of the trade-offs is correct; it
> would just be nice to have some real usage numbers.
>   

Actually, since the send-self-compressed-file-over-the-wire code is in a 
different place than the store-self-compressed-fulltext-in-the-repositry 
code, we theoretically /could/ use vdelta self-compression in the 
over-the-wire case, and would then be no worse off with pre-1.4 clients 
than we are now.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by David Glasser <gl...@davidglasser.net>.
On Jan 24, 2008 3:26 AM, Branko Čibej <br...@xbc.nu> wrote:
> David Glasser wrote:
> > On Jan 22, 2008 9:07 AM, Branko Čibej <br...@xbc.nu> wrote:
> >
> >> Bah. --pre-1.4.x-compatible is for old clients accessing via file://.
> >> svndiff0 over the wire is for old clients, period. I think burning that
> >> bit all the way through our APIs is more hassle than upgrading clients.
> >>
> >
> > I'll buy that for the FS; although it's also for "old repositories who
> > haven't done a dump and load, the only officially supported way of
> > upgrading repository format", I suspect that 1.5 will require an
> > "svnadmin upgrade" command anyway.
> >
>
> IMHO the inconvenience of forcing an upgrade of clients that access the
> repository locally is mitigated by the fact that they /do/ access the
> repository locally, in other words, it's quite likely that you'll have
> complete control over the client version when you upgrade the server and
> the repository.

Actually, my point here was just that the statement "when you upgrade
the server and the repository" makes it sounds like "upgrade the
repository" is an easy, constant-time operation.  We could easily make
that the case (because the current format changes never require more
than making some new files or directories and changing the "format"
files), but all we really support is dump and load.

> > Do we have actual numbers on client versions anywhere?  It wasn't long
> > ago that it was uncommon to find 1.4 in stable versions of various
> > distributions.
> >
>
> I don't think that's so important. Remember that we never send actual
> data from the repo to the client; it's always munged and deltas
> recomputed in some way. So this over-the-net slowdown would only affect
> pre-1.4 clients talking to a 1.5 server during initial checkout, when we
> can't send deltas.

Um, what do you mean "when we can't send deltas"?  The whole point
here was that even the original checkout of a file may be computed as
a delta against the empty stream.  (Unless there's some special-case
code somewhere to not do this for the first checkout, but I don't
think so...)

That said, I suspect your analysis of the trade-offs is correct; it
would just be nice to have some real usage numbers.

--dave


-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by Branko Čibej <br...@xbc.nu>.
David Glasser wrote:
> On Jan 22, 2008 9:07 AM, Branko Čibej <br...@xbc.nu> wrote:
>   
>> Bah. --pre-1.4.x-compatible is for old clients accessing via file://.
>> svndiff0 over the wire is for old clients, period. I think burning that
>> bit all the way through our APIs is more hassle than upgrading clients.
>>     
>
> I'll buy that for the FS; although it's also for "old repositories who
> haven't done a dump and load, the only officially supported way of
> upgrading repository format", I suspect that 1.5 will require an
> "svnadmin upgrade" command anyway.
>   

IMHO the inconvenience of forcing an upgrade of clients that access the 
repository locally is mitigated by the fact that they /do/ access the 
repository locally, in other words, it's quite likely that you'll have 
complete control over the client version when you upgrade the server and 
the repository.

> Do we have actual numbers on client versions anywhere?  It wasn't long
> ago that it was uncommon to find 1.4 in stable versions of various
> distributions.
>   

I don't think that's so important. Remember that we never send actual 
data from the repo to the client; it's always munged and deltas 
recomputed in some way. So this over-the-net slowdown would only affect 
pre-1.4 clients talking to a 1.5 server during initial checkout, when we 
can't send deltas.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by David Glasser <gl...@davidglasser.net>.
On Jan 22, 2008 9:07 AM, Branko Čibej <br...@xbc.nu> wrote:
>
> David Glasser wrote:
> > 2008/1/22 Niels Werensteijn <n....@student.utwente.nl>:
> >
> >> Summary so far:
> >>
> >> New files, or new chunks of files have no source stream (chunk) to
> >> compare it to, in order to make a diff of it. At this moment, vdelta
> >> routine is used on these streams. This takes a lot of cpu power. Even
> >> worse, since the diffs are compressed with zlib later on (both on disk
> >> and during transmission), vdelta is actualy making it harder for zlib to
> >> compress the stream, resulting in larger streams.
> >>
> >> This patch:
> >> In this patch I replace the call to vdelta with a single call to
> >> svn_txdelta__insert_op and just insert the whole chunk as new data. This
> >> costs almost no cpu time, and lets zlib compress it better.
> >>
> >> My test results on two repositories confirmed that the on disk size
> >> shrunk a little bit, and that cpu time was greatly reduced.
> >>
> >> My time results:
> >> Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
> >> Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds
> >>
> >
> > This seems like a reasonable idea, but note that we don't always send
> > txdeltas compressed.  Specifically, only the serialization of txdeltas
> > called "svndiff1" is compressed, not "svndiff0".  Some examples of
> > when each can be used:
> >
> > * In FSFS repositories, svndiff1 is only used in repositories of DB
> > format 2 and above.
> >
> > * In the svnserve protocol, svndiff1 is only used if the client and
> > server both support it (and declare so in their header).
> >
> > I'm not sure what controls the use of svndiff1 in the DAV protocol and
> > BDB repositories off the top of my head.
> >
> > My guess (not based on actual experimentation) is that your patch
> > would be a performance regression for cases that svndiff0 is
> > transmitted.  If that's the case (can you test it?  One way is to
> > create an FSFS repository with --pre-1.4.x-compatible and compare
> > sizes), we'd probably want to rev a bunch of functions to add a
> > boolean flag "output_will_be_compressed" or something which gets
> > passed all the way down to your one-line change.
> >
>
> Bah. --pre-1.4.x-compatible is for old clients accessing via file://.
> svndiff0 over the wire is for old clients, period. I think burning that
> bit all the way through our APIs is more hassle than upgrading clients.

I'll buy that for the FS; although it's also for "old repositories who
haven't done a dump and load, the only officially supported way of
upgrading repository format", I suspect that 1.5 will require an
"svnadmin upgrade" command anyway.

Do we have actual numbers on client versions anywhere?  It wasn't long
ago that it was uncommon to find 1.4 in stable versions of various
distributions.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by Branko Čibej <br...@xbc.nu>.
David Glasser wrote:
> 2008/1/22 Niels Werensteijn <n....@student.utwente.nl>:
>   
>> Summary so far:
>>
>> New files, or new chunks of files have no source stream (chunk) to
>> compare it to, in order to make a diff of it. At this moment, vdelta
>> routine is used on these streams. This takes a lot of cpu power. Even
>> worse, since the diffs are compressed with zlib later on (both on disk
>> and during transmission), vdelta is actualy making it harder for zlib to
>> compress the stream, resulting in larger streams.
>>
>> This patch:
>> In this patch I replace the call to vdelta with a single call to
>> svn_txdelta__insert_op and just insert the whole chunk as new data. This
>> costs almost no cpu time, and lets zlib compress it better.
>>
>> My test results on two repositories confirmed that the on disk size
>> shrunk a little bit, and that cpu time was greatly reduced.
>>
>> My time results:
>> Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
>> Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds
>>     
>
> This seems like a reasonable idea, but note that we don't always send
> txdeltas compressed.  Specifically, only the serialization of txdeltas
> called "svndiff1" is compressed, not "svndiff0".  Some examples of
> when each can be used:
>
> * In FSFS repositories, svndiff1 is only used in repositories of DB
> format 2 and above.
>
> * In the svnserve protocol, svndiff1 is only used if the client and
> server both support it (and declare so in their header).
>
> I'm not sure what controls the use of svndiff1 in the DAV protocol and
> BDB repositories off the top of my head.
>
> My guess (not based on actual experimentation) is that your patch
> would be a performance regression for cases that svndiff0 is
> transmitted.  If that's the case (can you test it?  One way is to
> create an FSFS repository with --pre-1.4.x-compatible and compare
> sizes), we'd probably want to rev a bunch of functions to add a
> boolean flag "output_will_be_compressed" or something which gets
> passed all the way down to your one-line change.
>   

Bah. --pre-1.4.x-compatible is for old clients accessing via file://. 
svndiff0 over the wire is for old clients, period. I think burning that 
bit all the way through our APIs is more hassle than upgrading clients.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by Niels Werensteijn <n....@student.utwente.nl>.
David Glasser schreef:
> 2008/1/22 Niels Werensteijn <n....@student.utwente.nl>:
>> Summary so far:
>>
>> New files, or new chunks of files have no source stream (chunk) to
>> compare it to, in order to make a diff of it. At this moment, vdelta
>> routine is used on these streams. This takes a lot of cpu power. Even
>> worse, since the diffs are compressed with zlib later on (both on disk
>> and during transmission), vdelta is actualy making it harder for zlib to
>> compress the stream, resulting in larger streams.
>>
>> This patch:
>> In this patch I replace the call to vdelta with a single call to
>> svn_txdelta__insert_op and just insert the whole chunk as new data. This
>> costs almost no cpu time, and lets zlib compress it better.
>>
>> My test results on two repositories confirmed that the on disk size
>> shrunk a little bit, and that cpu time was greatly reduced.
>>
>> My time results:
>> Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
>> Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds
> 
> This seems like a reasonable idea, but note that we don't always send
> txdeltas compressed.  Specifically, only the serialization of txdeltas
> called "svndiff1" is compressed, not "svndiff0".  Some examples of
> when each can be used:
> 
> * In FSFS repositories, svndiff1 is only used in repositories of DB
> format 2 and above.
> 
> * In the svnserve protocol, svndiff1 is only used if the client and
> server both support it (and declare so in their header).

True. Altough there has been some speculation as to how many 
clients/servers still use this. Also note that this patch is still a 
preformance gain in cpu time. But if the stream is not already 
compressed, we do indeed lose some size efficiency.

> I'm not sure what controls the use of svndiff1 in the DAV protocol and
> BDB repositories off the top of my head.
> 
> My guess (not based on actual experimentation) is that your patch
> would be a performance regression for cases that svndiff0 is
> transmitted.  If that's the case (can you test it?  One way is to
> create an FSFS repository with --pre-1.4.x-compatible and compare
> sizes)
Ok I did this.

Sizes For repository 1 (using du -s, on ext3):
"protocol"    without Patch    With Patch
svndiff0      49720            79384
svndiff1      31920            30692

for repository 2
"protocol"    without Patch    With Patch
svndiff0      37416            37520
svndiff1      37416            37400

Conclusion: Yes it seems it makes a lot of difference on normal text 
repositories. On repositories with compressed data it makes no real 
difference.

btw: Is there some sort of official suite of benchmark repositories? I 
feel a little stupid making conclusions on the basis of just 2 
repositories, altough they confirm theories and logical thinking :)

>, we'd probably want to rev a bunch of functions to add a
> boolean flag "output_will_be_compressed" or something which gets
> passed all the way down to your one-line change.
Well my, admittedly simple, test does seem to support that, given that 
there are still a significant number of client/servers using svndiff0. 
(I don't know the policies regarding this.)

Regards,
Niels

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Replace vdelta with singel insert op (was: [PATCH] Replace vdelta with xdelta variant)

Posted by David Glasser <gl...@davidglasser.net>.
2008/1/22 Niels Werensteijn <n....@student.utwente.nl>:
> Summary so far:
>
> New files, or new chunks of files have no source stream (chunk) to
> compare it to, in order to make a diff of it. At this moment, vdelta
> routine is used on these streams. This takes a lot of cpu power. Even
> worse, since the diffs are compressed with zlib later on (both on disk
> and during transmission), vdelta is actualy making it harder for zlib to
> compress the stream, resulting in larger streams.
>
> This patch:
> In this patch I replace the call to vdelta with a single call to
> svn_txdelta__insert_op and just insert the whole chunk as new data. This
> costs almost no cpu time, and lets zlib compress it better.
>
> My test results on two repositories confirmed that the on disk size
> shrunk a little bit, and that cpu time was greatly reduced.
>
> My time results:
> Export Repository1 (delphi source code): from 58,6 to 42,0 seconds
> Export Repository2 (big compressed bin): from 10,71 to 5,47 seconds

This seems like a reasonable idea, but note that we don't always send
txdeltas compressed.  Specifically, only the serialization of txdeltas
called "svndiff1" is compressed, not "svndiff0".  Some examples of
when each can be used:

* In FSFS repositories, svndiff1 is only used in repositories of DB
format 2 and above.

* In the svnserve protocol, svndiff1 is only used if the client and
server both support it (and declare so in their header).

I'm not sure what controls the use of svndiff1 in the DAV protocol and
BDB repositories off the top of my head.

My guess (not based on actual experimentation) is that your patch
would be a performance regression for cases that svndiff0 is
transmitted.  If that's the case (can you test it?  One way is to
create an FSFS repository with --pre-1.4.x-compatible and compare
sizes), we'd probably want to rev a bunch of functions to add a
boolean flag "output_will_be_compressed" or something which gets
passed all the way down to your one-line change.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org