You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@couchdb.apache.org by da...@apache.org on 2017/05/09 21:37:07 UTC

[couchdb] branch COUCHDB-3298-optimize-writing-btree-nodes updated (fd0e0e3 -> 4c5a4f7)

This is an automated email from the ASF dual-hosted git repository.

davisp pushed a change to branch COUCHDB-3298-optimize-writing-btree-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git.

  discards  fd0e0e3   Fix check for prefix of suffix
  discards  fd244ec   Refactor optimization
  discards  40b195f   Refactored optimization
  discards  58bc8e6   debug
  discards  d810603   Revert "Make couch_btree:chunkify/1 prefer fewer chunks"
     omits  d9623fb   Apply the same optimization for prefix writes
     omits  de622c6   Opimize writing KV node append writes
       new  56eafca   Revert "Make couch_btree:chunkify/1 prefer fewer chunks"
       new  4c5a4f7   Opimize writing KV node append writes

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (fd0e0e3)
            \
             N -- N -- N   refs/heads/COUCHDB-3298-optimize-writing-btree-nodes (4c5a4f7)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omits" are not gone; other references still
refer to them.  Any revisions marked "discards" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.


Summary of changes:
 rel/overlay/etc/default.ini                   |  6 +++---
 src/couch/include/couch_db.hrl                |  3 +--
 src/couch/src/couch_btree.erl                 | 23 +++--------------------
 src/couch_mrview/src/couch_mrview_index.erl   |  7 +------
 src/couch_mrview/src/couch_mrview_updater.erl |  1 -
 src/couch_mrview/src/couch_mrview_util.erl    |  6 ++----
 6 files changed, 10 insertions(+), 36 deletions(-)

-- 
To stop receiving notification emails like this one, please contact
['"commits@couchdb.apache.org" <co...@couchdb.apache.org>'].

[couchdb] 01/02: Revert "Make couch_btree:chunkify/1 prefer fewer chunks"

Posted by da...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

davisp pushed a commit to branch COUCHDB-3298-optimize-writing-btree-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git

commit 56eafca869d6ba339fc38d64ff87d70c6e711355
Author: Paul J. Davis <pa...@gmail.com>
AuthorDate: Tue May 9 10:55:34 2017 -0500

    Revert "Make couch_btree:chunkify/1 prefer fewer chunks"
    
    This reverts commit 8556adbb98e79a09ec254967ee6acf3bef8d1fb6.
---
 src/couch/src/couch_btree.erl | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/couch/src/couch_btree.erl b/src/couch/src/couch_btree.erl
index adbc92b..d61daf1 100644
--- a/src/couch/src/couch_btree.erl
+++ b/src/couch/src/couch_btree.erl
@@ -342,9 +342,11 @@ complete_root(Bt, KPs) ->
 % it's probably really inefficient.
 
 chunkify(InList) ->
-    ChunkThreshold = get_chunk_size(),
+    BaseChunkSize = get_chunk_size(),
     case ?term_size(InList) of
-    Size when Size > ChunkThreshold ->
+    Size when Size > BaseChunkSize ->
+        NumberOfChunksLikely = ((Size div BaseChunkSize) + 1),
+        ChunkThreshold = Size div NumberOfChunksLikely,
         chunkify(InList, ChunkThreshold, [], 0, []);
     _Else ->
         [InList]

-- 
To stop receiving notification emails like this one, please contact
"commits@couchdb.apache.org" <co...@couchdb.apache.org>.

[couchdb] 02/02: Opimize writing KV node append writes

Posted by da...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

davisp pushed a commit to branch COUCHDB-3298-optimize-writing-btree-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git

commit 4c5a4f77c885c6893fb7e0f2123c0d5aef9a1473
Author: Paul J. Davis <pa...@gmail.com>
AuthorDate: Wed May 3 12:27:08 2017 -0500

    Opimize writing KV node append writes
    
    As it turns out, the original change in COUCHDB-3298 ends up hurting
    disk usage when a view emits large amounts of data (i.e., more than
    half of the btree chunk size). The cause for this is that instead of
    writing single element nodes it would instead prefer to write kv nodes
    with three elements. While normally we might prefer this in memory, it
    turns out that our append only storage this causes a significantly more
    amount of trash on disk.
    
    We can show this with a few trivial examples. Imagine we write KV's a
    through f. The two following patterns show the nodes as we write each
    new kv.
    
        Before 3298:
    
        []
        [a]
        [a, b]
        [a, b]', [c]
        [a, b]', [c, d]
        [a, b]', [c, d]', [e]
        [a, b]', [c, d]', [e, f]
    
        After 3298:
    
        []
        [a]
        [a, b]
        [a, b, c]
        [a, b]', [c, d]
        [a, b]', [c, d, e]
        [a, b]', [c, d]', [e, f]
    
    The thing to realize here is which of these nodes end up as garbage. In
    the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
    that have been orphaned. Where as in the second case we end up with
    [a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
    orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
    is due to how a btree update works. For instance, when adding c, we read
    [a, b] into memory, append c, and then during our node write we call
    chunkify which gives us back [a, b], [c] which leads us to writing [a,
    b] a second time.
    
    The main benefit of this patch is to realize when its possible to reuse
    a node that already exists on disk. It achieves this by looking at the
    list of key/values when writing new nodes and comparing it to the old
    list of key/values for the node read from disk. By checking to see if
    the old list exists unchanged in the new list we can just reuse the old
    node. Node reuse is limited to when the old node is larger than 50% of
    the chunk threshold to maintain the B+Tree properties.
    
    The disk usage improvements this gives can also be quite dramatic. In
    the case above when we have ordered keys with large values (> 50% of the
    btree chunk size) we find upwards of 50% less disk usage. Random keys
    also benefit as well though to a lesser extent depending on disk size
    (as they will often be in the middle of an existing node which prevents
    our optimization).
---
 src/couch/src/couch_btree.erl | 64 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/src/couch/src/couch_btree.erl b/src/couch/src/couch_btree.erl
index d61daf1..ea224b1 100644
--- a/src/couch/src/couch_btree.erl
+++ b/src/couch/src/couch_btree.erl
@@ -19,6 +19,8 @@
 
 -include_lib("couch/include/couch_db.hrl").
 
+-define(FILL_RATIO, 0.5).
+
 extract(#btree{extract_kv=undefined}, Value) ->
     Value;
 extract(#btree{extract_kv=Extract}, Value) ->
@@ -398,7 +400,14 @@ modify_node(Bt, RootPointerInfo, Actions, QueryOutput) ->
         {LastKey, _LastValue} = element(tuple_size(NodeTuple), NodeTuple),
         {ok, [{LastKey, RootPointerInfo}], QueryOutput2};
     _Else2 ->
-        {ok, ResultList} = write_node(Bt, NodeType, NewNodeList),
+        {ok, ResultList} = case RootPointerInfo of
+        nil ->
+            write_node(Bt, NodeType, NewNodeList);
+        _ ->
+            {LastKey, _LastValue} = element(tuple_size(NodeTuple), NodeTuple),
+            OldNode = {LastKey, RootPointerInfo},
+            write_node(Bt, OldNode, NodeType, NodeList, NewNodeList)
+        end,
         {ok, ResultList, QueryOutput2}
     end.
 
@@ -442,6 +451,59 @@ write_node(#btree{fd = Fd, compression = Comp} = Bt, NodeType, NodeList) ->
     ],
     {ok, ResultList}.
 
+
+write_node(Bt, _OldNode, NodeType, [], NewList) ->
+    write_node(Bt, NodeType, NewList);
+write_node(Bt, _OldNode, NodeType, [_], NewList) ->
+    write_node(Bt, NodeType, NewList);
+write_node(Bt, OldNode, NodeType, OldList, NewList) ->
+    case can_reuse_old_node(OldList, NewList) of
+        {true, Prefix, Suffix} ->
+            {ok, PrefixKVs} = case Prefix of
+                [] -> {ok, []};
+                _ -> write_node(Bt, NodeType, Prefix)
+            end,
+            {ok, SuffixKVs} = case Suffix of
+                [] -> {ok, []};
+                _ -> write_node(Bt, NodeType, Suffix)
+            end,
+            Result = PrefixKVs ++ [OldNode] ++ SuffixKVs,
+            {ok, Result};
+        false ->
+            write_node(Bt, NodeType, NewList)
+    end.
+
+can_reuse_old_node(OldList, NewList) ->
+    {Prefix, RestNewList} = remove_prefix_kvs(hd(OldList), NewList),
+    case old_list_is_prefix(OldList, RestNewList, 0) of
+        {true, Size, Suffix} ->
+            ReuseThreshold = get_chunk_size() * ?FILL_RATIO,
+            if Size < ReuseThreshold -> false; true ->
+                {true, Prefix, Suffix}
+            end;
+        false ->
+            false
+    end.
+
+remove_prefix_kvs(KV1, [KV2 | Rest]) when KV2 < KV1 ->
+    {Prefix, RestNewList} = remove_prefix_kvs(KV1, Rest),
+    {[KV2 | Prefix], RestNewList};
+remove_prefix_kvs(_, RestNewList) ->
+    {[], RestNewList}.
+
+% No more KV's in the old node so its a prefix
+old_list_is_prefix([], Suffix, Size) ->
+    {true, Size, Suffix};
+% Some KV's have been removed from the old node
+old_list_is_prefix(_OldList, [], _Size) ->
+    false;
+% KV is equal in both old and new node so continue
+old_list_is_prefix([KV | Rest1], [KV | Rest2], Acc) ->
+    old_list_is_prefix(Rest1, Rest2, ?term_size(KV) + Acc);
+% KV mismatch between old and new node so not a prefix
+old_list_is_prefix(_OldList, _NewList, _Acc) ->
+    false.
+
 modify_kpnode(Bt, {}, _LowerBound, Actions, [], QueryOutput) ->
     modify_node(Bt, nil, Actions, QueryOutput);
 modify_kpnode(_Bt, NodeTuple, LowerBound, [], ResultNode, QueryOutput) ->

-- 
To stop receiving notification emails like this one, please contact
"commits@couchdb.apache.org" <co...@couchdb.apache.org>.