You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by eiri <gi...@git.apache.org> on 2015/06/24 19:52:13 UTC

[GitHub] couchdb-couch pull request: Remove compression's optimization

GitHub user eiri opened a pull request:

    https://github.com/apache/couchdb-couch/pull/61

    Remove compression's optimization

    When a file compression set to snappy, couch is doing an additional
    optimization step by also compressing the term with deflate,
    comparing the sizes of the result binary and choosing the smaller one.
    This leads to a situation when for snappy compresed database the
    'winning' deflate compressed term got decompressed and compressed
    back into deflate on each document's write.
    
    This patch removes this compression's optimization.
    
    [Basic test](http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d)
    demonstrate that the gained with it disk space is not significant
    enough to justify empty CPU cycles.
    
    This closes COUCHDB-2726

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eiri/couchdb-couch remove-compressor-optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/couchdb-couch/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #61
    
----
commit 5b7a0581a772ec3108bac8417216533706a69529
Author: Eric Avdey <ei...@eiri.ca>
Date:   2015-06-24T14:50:39Z

    Remove compression's optimization
    
    When a file compression set to snappy, couch is doing an additional
    optimization step by also compressing the term with deflate,
    comparing the sizes of the result binary and choosing the smaller one.
    This leads to a situation when for snappy compresed database the
    'winning' deflate compressed term got decompressed and compressed
    back into deflate on each document's write.
    
    This patch removes this compression's optimization.
    [Basic test](http://nbviewer.ipython.org/gist/eiri/79d91a797af9c6a6ff6d)
    demonstrate that the gained with it disk space is not significant
    enough to justify empty CPU cycles.
    
    This closes COUCHDB-2726

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by eiri <gi...@git.apache.org>.
Github user eiri commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/61#issuecomment-115334559
  
    @kxepal Thanks, stumble upon it recently, still wrapping my head around it :smile_cat:
    
    The size diff is going to grow, sure, but in somehow glacier-like pace. I've made a quick comparison on 10000 docs database. There something like 200K size diff on optimization for 30MB of _pure_ data growth and delta itself seems to be linear. I've updated the notebook's gist with pretty graphs for the reference.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by janl <gi...@git.apache.org>.
Github user janl commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/61#issuecomment-115022702
  
    nice one!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by kxepal <gi...@git.apache.org>.
Github user kxepal commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/61#issuecomment-115023861
  
    I believe this difference will only grow with database size, right?
    
    P.S. Nice IPyNotebook usage here!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by davisp <gi...@git.apache.org>.
Github user davisp commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/61#issuecomment-121363539
  
    Ahh, subtle. Took me a few minutes of reading to realize that when the optimization kicks in we store the deflate version. On a subsequent read/write we realize its not snappy so we do the whole snappy compression again but then realize the snappy version is slightly bigger.
    
    +1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by eiri <gi...@git.apache.org>.
Github user eiri commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/61#issuecomment-125178603
  
    Merged as per https://github.com/apache/couchdb-couch/commit/3a26ea1ba09e50da3b97b64e6e1ebf75c9406202
    
    Closing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by kxepal <gi...@git.apache.org>.
Github user kxepal commented on the pull request:

    https://github.com/apache/couchdb-couch/pull/61#issuecomment-116786483
  
    It seems the most safe is in CPU cycles which should be notable on weak hosts. But disk space reduction is a good small bonus (:
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] couchdb-couch pull request: Remove compression's optimization

Posted by eiri <gi...@git.apache.org>.
Github user eiri closed the pull request at:

    https://github.com/apache/couchdb-couch/pull/61


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---