You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by va...@apache.org on 2018/08/15 21:05:55 UTC
[couchdb] branch master updated: Reduce size of #leaf.atts keys
This is an automated email from the ASF dual-hosted git repository.
vatamane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/couchdb.git
The following commit(s) were added to refs/heads/master by this push:
new 28ba48d Reduce size of #leaf.atts keys
28ba48d is described below
commit 28ba48da5877a5d471253c1b587af7e3b3121fd9
Author: Nick Vatamaniuc <va...@apache.org>
AuthorDate: Wed Aug 15 15:23:52 2018 -0400
Reduce size of #leaf.atts keys
`#leaf.atts` data structure is a `[{Position, AttachmentLength}, ...]` proplist
which keeps track of attachment lengths and it is used when calculating
external data size of documents. `Position` is supposed to uniquely identify an
attachment in a file stream. Initially it was just an integer file offset. Then,
after some refactoring work it became a list of `{Position, Size}` tuples.
During the PSE work streams were abstracted such that each engine can supply
its own stream implementation. The position in the stream then became a tuple
that looks like `{couch_bt_engine_stream,{<0.1922.0>,[{4267,21}]}}`. This was
written to the file the `#leaf.atts` data structure. While still correct, it is
unnecessarily verbose wasting around 100 bytes per attachment, per leaf.
To fix it use the disk serialized version of the stream position as returned
from `couch_stream:to_disk_term`. In case of the default CouchDB engine
implementation, this should avoid writing the module name and the pid value for
each attachment entry.
---
src/couch/src/couch_att.erl | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/couch/src/couch_att.erl b/src/couch/src/couch_att.erl
index 16edd66..a24de21 100644
--- a/src/couch/src/couch_att.erl
+++ b/src/couch/src/couch_att.erl
@@ -308,8 +308,14 @@ size_info([]) ->
{ok, []};
size_info(Atts) ->
Info = lists:map(fun(Att) ->
- [{_, Pos}, AttLen] = fetch([data, att_len], Att),
- {Pos, AttLen}
+ AttLen = fetch(att_len, Att),
+ case fetch(data, Att) of
+ {stream, StreamEngine} ->
+ {ok, SPos} = couch_stream:to_disk_term(StreamEngine),
+ {SPos, AttLen};
+ {_, SPos} ->
+ {SPos, AttLen}
+ end
end, Atts),
{ok, lists:usort(Info)}.