You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Luke Mauldin <lu...@icloud.com> on 2021/12/13 19:30:52 UTC

Compression Question

From reading the documentation, I can see that Subversion 1.14 supports both zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on ZFS which supports native zstd compression.  Some of the repos I host are relatively large (60K revisions and 60GB+) and I am wondering what combination will give me the best performance?  Currently, I have Subversion compression disabled and ZFS with zstd compression enabled.  In this setup, ZFS reports a compression ratio of 1.69X.  I would think if Subversion natively supported ZSTD compression that would be best but since it does not, I just wanted to see if anyone had recommendations?

Luke

Re: Compression Question

Posted by Mark McKeown <ma...@wandisco.com>.
There is a set of benchmarks comparing the algorithms (lz4, zstd, zlib
etc) and the tradeoffs here:

http://facebook.github.io/zstd/

cheers
Mark

On Mon, Dec 13, 2021 at 10:14 PM Nathan Hartman <ha...@gmail.com>
wrote:

> On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lu...@icloud.com>
> wrote:
> >
> > From reading the documentation, I can see that Subversion 1.14 supports
> both zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on
> ZFS which supports native zstd compression.  Some of the repos I host are
> relatively large (60K revisions and 60GB+) and I am wondering what
> combination will give me the best performance?  Currently, I have
> Subversion compression disabled and ZFS with zstd compression enabled.  In
> this setup, ZFS reports a compression ratio of 1.69X.  I would think if
> Subversion natively supported ZSTD compression that would be best but since
> it does not, I just wanted to see if anyone had recommendations?
>
>
> As I understand it, the motivation for adding LZ4 compression (added
> in 1.10) was speed. From vague memory (I haven't looked into
> compression algorithms recently), I think zlib achieves a better
> compression ratio in terms of disk space saved, but LZ4 is faster. I
> haven't had experience with zstd yet.
>
> It is difficult to say which compression format would give the "best"
> performance for a particular application without some experimentation
> because things like hardware I/O speeds and the nature of the data
> being compressed affect the outcome.
>
> Are you looking for the best speed, the best compression ratio, a good
> tradeoff between the two?
>
> If you want to conserve disk space, I would suggest (if it's feasible
> and on a separate machine, not in production), to produce a dumpfile
> and load it twice, once with zlib and once with LZ4, and then compare
> the resulting on-disk sizes to that of the volumes on zstd. Note
> Subversion's data deduplication feature: if this was turned off in the
> past or is off now, some or all of your repo might contain duplicated
> data; to make the experiment "fair" you would need to take this into
> account.
>
> If you are looking for best performance in terms of speed, I don't
> have a simple answer for this because it depends on a great many
> variables in which Subversion's compression is but one. I would assume
> that networking I/O probably plays a bigger role than compression
> here.
>
> Hope this helps,
> Nathan
>


-- 
*MARK MC KEOWN DEVELOPER*

*E* mark.mckeown@wandisco.com

-- 


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE 
PRIVILEGED


If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone. Any 
distribution, use or copying of this email or the information it contains 
by other than an intended recipient is unauthorized. The views and opinions 
expressed in this email message are the author's own and may not reflect 
the views and opinions of WANdisco, unless the author is authorized by 
WANdisco to express such views or opinions on its behalf. All email sent to 
or from this address is subject to electronic storage and review by 
WANdisco. Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.

Re: Compression Question

Posted by Nathan Hartman <ha...@gmail.com>.
On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lu...@icloud.com> wrote:
>
> From reading the documentation, I can see that Subversion 1.14 supports both zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on ZFS which supports native zstd compression.  Some of the repos I host are relatively large (60K revisions and 60GB+) and I am wondering what combination will give me the best performance?  Currently, I have Subversion compression disabled and ZFS with zstd compression enabled.  In this setup, ZFS reports a compression ratio of 1.69X.  I would think if Subversion natively supported ZSTD compression that would be best but since it does not, I just wanted to see if anyone had recommendations?


As I understand it, the motivation for adding LZ4 compression (added
in 1.10) was speed. From vague memory (I haven't looked into
compression algorithms recently), I think zlib achieves a better
compression ratio in terms of disk space saved, but LZ4 is faster. I
haven't had experience with zstd yet.

It is difficult to say which compression format would give the "best"
performance for a particular application without some experimentation
because things like hardware I/O speeds and the nature of the data
being compressed affect the outcome.

Are you looking for the best speed, the best compression ratio, a good
tradeoff between the two?

If you want to conserve disk space, I would suggest (if it's feasible
and on a separate machine, not in production), to produce a dumpfile
and load it twice, once with zlib and once with LZ4, and then compare
the resulting on-disk sizes to that of the volumes on zstd. Note
Subversion's data deduplication feature: if this was turned off in the
past or is off now, some or all of your repo might contain duplicated
data; to make the experiment "fair" you would need to take this into
account.

If you are looking for best performance in terms of speed, I don't
have a simple answer for this because it depends on a great many
variables in which Subversion's compression is but one. I would assume
that networking I/O probably plays a bigger role than compression
here.

Hope this helps,
Nathan

Re: Compression Question

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, Dec 14, 2021 at 10:56 AM Luke Mauldin <lu...@icloud.com> wrote:
>
> When compression is enabled at the SVN level, what exactly does it compress? Does it just compress the original file content and the deltas?

I am really just a layman so cannot explain it accurately ... but here goes.

SVN does not really store the original file content. It stores a DELTA
which does a bit of a compression itself. I believe it uses the xdelta
algorithm and when you add a new file it just creates the delta
against an empty file. The structure of the revision files in a fsfs
repository is explained here:

http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure

One section of the file is the "representations". This is where the
file DELTA would be stored. I believe the representations are the only
part of the revision file where additional "compression" might then be
applied (using zlib or lz4). So the representation is always somewhat
compressed in that it is a DELTA and then it can optionally be
additionally compressed using zlib or lz4. The revision file contains
other housekeeping and indexing data that is not compressed.

Of course if you are storing this on ZFS with compression enabled then
the entire file is just compressed.

Mark

Re: Compression Question

Posted by Luke Mauldin <lu...@icloud.com>.
When compression is enabled at the SVN level, what exactly does it compress? Does it just compress the original file content and the deltas?

> On Dec 14, 2021, at 8:44 AM, Mark Phippard <ma...@gmail.com> wrote:
> 
> On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lu...@icloud.com> wrote:
>> 
>> From reading the documentation, I can see that Subversion 1.14 supports both zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on ZFS which supports native zstd compression.  Some of the repos I host are relatively large (60K revisions and 60GB+) and I am wondering what combination will give me the best performance?  Currently, I have Subversion compression disabled and ZFS with zstd compression enabled.  In this setup, ZFS reports a compression ratio of 1.69X.  I would think if Subversion natively supported ZSTD compression that would be best but since it does not, I just wanted to see if anyone had recommendations?
>> 
> 
> I think what you are already doing is the best option. This should
> give you the best performance. It would be of little value (for you)
> if Subversion provided zstd compression since it is being done at the
> file system layer.
> 
> Mark

Re: Compression Question

Posted by Mark Phippard <ma...@gmail.com>.
On Mon, Dec 13, 2021 at 2:31 PM Luke Mauldin <lu...@icloud.com> wrote:
>
> From reading the documentation, I can see that Subversion 1.14 supports both zlib and lz4 compression.  I am running Subversion on FreeBSD 13.X on ZFS which supports native zstd compression.  Some of the repos I host are relatively large (60K revisions and 60GB+) and I am wondering what combination will give me the best performance?  Currently, I have Subversion compression disabled and ZFS with zstd compression enabled.  In this setup, ZFS reports a compression ratio of 1.69X.  I would think if Subversion natively supported ZSTD compression that would be best but since it does not, I just wanted to see if anyone had recommendations?
>

I think what you are already doing is the best option. This should
give you the best performance. It would be of little value (for you)
if Subversion provided zstd compression since it is being done at the
file system layer.

Mark