You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/24 19:53:04 UTC
[jira] [Commented] (COUCHDB-2726) Remove a compression's over-optimization

    [ https://issues.apache.org/jira/browse/COUCHDB-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599829#comment-14599829 ] 

ASF GitHub Bot commented on COUCHDB-2726:
-----------------------------------------

GitHub user eiri opened a pull request:

    https://github.com/apache/couchdb-couch/pull/61

    Remove compression's optimization

    When a file compression set to snappy, couch is doing an additional
    optimization step by also compressing the term with deflate,
    comparing the sizes of the result binary and choosing the smaller one.
    This leads to a situation when for snappy compresed database the
    'winning' deflate compressed term got decompressed and compressed
    back into deflate on each document's write.
    
    This patch removes this compression's optimization.
    
    [Basic test](http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d)
    demonstrate that the gained with it disk space is not significant
    enough to justify empty CPU cycles.
    
    This closes COUCHDB-2726

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eiri/couchdb-couch remove-compressor-optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #61
    
----
commit 5b7a0581a772ec3108bac8417216533706a69529
Author: Eric Avdey <ei...@eiri.ca>
Date:   2015-06-24T14:50:39Z

    Remove compression's optimization
    
    When a file compression set to snappy, couch is doing an additional
    optimization step by also compressing the term with deflate,
    comparing the sizes of the result binary and choosing the smaller one.
    This leads to a situation when for snappy compresed database the
    'winning' deflate compressed term got decompressed and compressed
    back into deflate on each document's write.
    
    This patch removes this compression's optimization.
    [Basic test](http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d)
    demonstrate that the gained with it disk space is not significant
    enough to justify empty CPU cycles.
    
    This closes COUCHDB-2726

----


> Remove a compression's over-optimization
> ----------------------------------------
>
>                 Key: COUCHDB-2726
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2726
>             Project: CouchDB
>          Issue Type: Improvement
>      Security Level: public(Regular issues) 
>            Reporter: Eric Avdey
>            Assignee: Eric Avdey
>            Priority: Minor
>
> When a file compression set to snappy, couch is doing an additional optimization step by also compressing the term with deflate, comparing the sizes of the result binary and choosing the smaller one. This leads to a situation when "winning" deflated term got decompressed and compressed back on each document update, because deflate's compressed terms are not recognized with option file_compression set to snappy. This is done to allow migration from deflate to snappy.
> However this optimization is a problem, because couch keeps field `body` in #doc record as 2 elements tuple of compressed body and compressed list of the attachments pointers. If the document doesn't have the attachments the pointers are an empty list which always compressed by deflate better than by snappy. In other words, if the option file_compression set to snappy almost every document in all databases goes through decompression\compression cycle on each write.
> Basic test shows that this compression optimization on average saves less that one percent of the disk space, so it doesn't worth to trade this space for CPU cycles.
> http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d
> I suggest to remove this optimization all together and just follow configured option for choosing the compression library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)