You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by va...@apache.org on 2018/08/15 21:05:55 UTC

[couchdb] branch master updated: Reduce size of #leaf.atts keys

This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/couchdb.git


The following commit(s) were added to refs/heads/master by this push:
     new 28ba48d  Reduce size of #leaf.atts keys
28ba48d is described below

commit 28ba48da5877a5d471253c1b587af7e3b3121fd9
Author: Nick Vatamaniuc <va...@apache.org>
AuthorDate: Wed Aug 15 15:23:52 2018 -0400

    Reduce size of #leaf.atts keys
    
    `#leaf.atts` data structure is a `[{Position, AttachmentLength}, ...]` proplist
    which keeps track of attachment lengths and it is used when calculating
    external data size of documents. `Position` is supposed to uniquely identify an
    attachment in a file stream. Initially it was just an integer file offset. Then,
    after some refactoring work it became a list of `{Position, Size}` tuples.
    
    During the PSE work streams were abstracted such that each engine can supply
    its own stream implementation. The position in the stream then became a tuple
    that looks like `{couch_bt_engine_stream,{<0.1922.0>,[{4267,21}]}}`. This was
    written to the file the `#leaf.atts` data structure. While still correct, it is
    unnecessarily verbose wasting around 100 bytes per attachment, per leaf.
    
    To fix it use the disk serialized version of the stream position as returned
    from `couch_stream:to_disk_term`. In case of the default CouchDB engine
    implementation, this should avoid writing the module name and the pid value for
    each attachment entry.
---
 src/couch/src/couch_att.erl | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/couch/src/couch_att.erl b/src/couch/src/couch_att.erl
index 16edd66..a24de21 100644
--- a/src/couch/src/couch_att.erl
+++ b/src/couch/src/couch_att.erl
@@ -308,8 +308,14 @@ size_info([]) ->
     {ok, []};
 size_info(Atts) ->
     Info = lists:map(fun(Att) ->
-        [{_, Pos}, AttLen] = fetch([data, att_len], Att),
-        {Pos, AttLen}
+        AttLen = fetch(att_len, Att),
+        case fetch(data, Att) of
+             {stream, StreamEngine} ->
+                 {ok, SPos} = couch_stream:to_disk_term(StreamEngine),
+                 {SPos, AttLen};
+             {_, SPos} ->
+                 {SPos, AttLen}
+        end
     end, Atts),
     {ok, lists:usort(Info)}.