You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "Eric Avdey (JIRA)" <ji...@apache.org> on 2015/06/24 19:31:04 UTC

[jira] [Created] (COUCHDB-2726) Remove a compression's over-optimization

Eric Avdey created COUCHDB-2726:
-----------------------------------

             Summary: Remove a compression's over-optimization
                 Key: COUCHDB-2726
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2726
             Project: CouchDB
          Issue Type: Improvement
      Security Level: public (Regular issues)
            Reporter: Eric Avdey


When a file compression set to snappy, couch is doing an additional optimization step by also compressing the term with deflate, comparing the sizes of the result binary and choosing the smaller one. This leads to a situation when "winning" deflated term got decompressed and compressed back on each document update, because deflate's compressed terms are not recognized with option file_compression set to snappy. This is done to allow migration from deflate to snappy.

However this optimization is a problem, because couch keeps field `body` in #doc record as 2 elements tuple of compressed body and compressed list of the attachments pointers. If the document doesn't have the attachments the pointers are an empty list which always compressed by deflate better than by snappy. In other words, if the option file_compression set to snappy almost every document in all databases goes through decompression\compression cycle on each write.

Basic test shows that this compression optimization on average saves less that one percent of the disk space, so it doesn't worth to trade this space for CPU cycles.

http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d

I suggest to remove this optimization all together and just follow configured option for choosing the compression library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)