You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by da...@apache.org on 2017/05/08 23:12:00 UTC
[couchdb] branch COUCHDB-3298-optimize-writing-kv-nodes updated
(6b2ed29 -> d9623fb)
This is an automated email from the ASF dual-hosted git repository.
davisp pushed a change to branch COUCHDB-3298-optimize-writing-kv-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git.
omits 6b2ed29 Opimize writing KV node append writes
adds 50a738a Add jittered sleep during replicator shard scanning
adds 4a63d22 Merge pull request #484 from cloudant/couchdb-3389
adds e38b1d6 Switch to using Travis containerised builds
new de622c6 Opimize writing KV node append writes
new d9623fb Apply the same optimization for prefix writes
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (6b2ed29)
\
N -- N -- N refs/heads/COUCHDB-3298-optimize-writing-kv-nodes (d9623fb)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omits" are not gone; other references still
refer to them. Any revisions marked "discards" are gone forever.
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.
Summary of changes:
.travis.yml | 59 +++++++++++++++++----
src/couch/src/couch_btree.erl | 54 ++++++++++---------
src/couch/src/couch_multidb_changes.erl | 91 ++++++++++++++++++---------------
3 files changed, 127 insertions(+), 77 deletions(-)
--
To stop receiving notification emails like this one, please contact
['"commits@couchdb.apache.org" <co...@couchdb.apache.org>'].
[couchdb] 01/02: Opimize writing KV node append writes
Posted by da...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
davisp pushed a commit to branch COUCHDB-3298-optimize-writing-kv-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git
commit de622c6c2c86d693df8a3c68351b803db5e04ff6
Author: Paul J. Davis <pa...@gmail.com>
AuthorDate: Wed May 3 12:27:08 2017 -0500
Opimize writing KV node append writes
As it turns out, the original change in COUCHDB-3298 ends up hurting
disk usage when a view emits large amounts of data (i.e., more than
half of the btree chunk size). The cause for this is that instead of
writing single element nodes it would instead prefer to write kv nodes
with three elements. While normally we might prefer this in memory, it
turns out that our append only storage this causes a significantly more
amount of trash on disk.
We can show this with a few trivial examples. Imagine we write KV's a
through f. The two following patterns show the nodes as we write each
new kv.
Before 3298:
[]
[a]
[a, b]
[a, b]', [c]
[a, b]', [c, d]
[a, b]', [c, d]', [e]
[a, b]', [c, d]', [e, f]
After 3298:
[]
[a]
[a, b]
[a, b, c]
[a, b]', [c, d]
[a, b]', [c, d, e]
[a, b]', [c, d]', [e, f]
The thing to realize here is which of these nodes end up as garbage. In
the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
that have been orphaned. Where as in the second case we end up with
[a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
is due to how a btree update works. For instance, when adding c, we read
[a, b] into memory, append c, and then during our node write we call
chunkify which gives us back [a, b], [c] which leads us to writing [a,
b] a second time.
This patch changes the write function to realize when we're merely
appending KVs and saves us this extra write and generation of garbage.
Its node patterns look like such:
[]
[a]
[a, b]
[a, b], [c]
[a, b], [c, d]
[a, b], [c, d], [e]
[a, b], [c, d], [e, f]
Which means we only end up generating [a], [c], and [e] as garbage (with
respect to kv nodes, kp nodes retain their historical behavior).
---
src/couch/src/couch_btree.erl | 68 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 67 insertions(+), 1 deletion(-)
diff --git a/src/couch/src/couch_btree.erl b/src/couch/src/couch_btree.erl
index adbc92b..1a4ad60 100644
--- a/src/couch/src/couch_btree.erl
+++ b/src/couch/src/couch_btree.erl
@@ -396,7 +396,8 @@ modify_node(Bt, RootPointerInfo, Actions, QueryOutput) ->
{LastKey, _LastValue} = element(tuple_size(NodeTuple), NodeTuple),
{ok, [{LastKey, RootPointerInfo}], QueryOutput2};
_Else2 ->
- {ok, ResultList} = write_node(Bt, NodeType, NewNodeList),
+ {ok, ResultList} = write_node(
+ Bt, RootPointerInfo, NodeType, NodeList, NewNodeList),
{ok, ResultList, QueryOutput2}
end.
@@ -440,6 +441,71 @@ write_node(#btree{fd = Fd, compression = Comp} = Bt, NodeType, NodeList) ->
],
{ok, ResultList}.
+% Don't make our append-only write optimization for
+% kp nodes.
+write_node(Bt, _OldNode, kp_node, _OldList, NewList) ->
+ write_node(Bt, kp_node, NewList);
+
+% If we're creating a new kv node then there's no
+% possibility for the optimization
+write_node(Bt, _OldNode, NodeType, [], NewList) ->
+ write_node(Bt, NodeType, NewList);
+
+% Disable the optimization for nodes that only
+% have a single element so we don't end up increasing
+% the number of reads when folding a btree
+write_node(Bt, _OldNode, NodeType, [_], NewList) ->
+ write_node(Bt, NodeType, NewList);
+
+% If a KV node has had a new key appended to the
+% end of its list we can instead take the appended
+% KVs and create a new node while reusing the old
+% node already on disk. This saves us both the effort
+% of writing data that's already on disk as well as
+% saves us the disk space that would have been
+% orphaned by not reusing the old node.
+write_node(Bt, OldNode, NodeType, OldList, NewList) ->
+ case is_append_only(OldList, NewList) of
+ false ->
+ write_node(Bt, NodeType, NewList);
+ {true, Suffix} ->
+ case old_node_full(OldList) of
+ true ->
+ {ok, Results} = write_node(Bt, NodeType, Suffix),
+ {OldLastKey, _} = lists:last(OldList),
+ {ok, [{OldLastKey,OldNode} | Results]};
+ false ->
+ write_node(Bt, NodeType, NewList)
+ end
+ end.
+
+% This function will blow up if OldList == NewList
+% on purpose as an assertion that modify_node
+% doesn't provide this input.
+%
+% First clause finds our suff
+is_append_only([], [_ | _] = Suffix) ->
+ {true, Suffix};
+
+% We've removed keys from the node
+is_append_only([_ | _], []) ->
+ false;
+
+% We've found a mismatch of keys in the node
+is_append_only([KV1 | _], [KV2 | _]) when KV1 /= KV2 ->
+ false;
+
+% Current key is equal, keep going
+is_append_only([KV | Rest1], [KV | Rest2]) ->
+ is_append_only(Rest1, Rest2).
+
+old_node_full(OldList) ->
+ ChunkThreshold = get_chunk_size(),
+ NodeSize = lists:foldl(fun(KV, Acc) ->
+ Acc + ?term_size(KV)
+ end, 0, OldList),
+ NodeSize >= ChunkThreshold.
+
modify_kpnode(Bt, {}, _LowerBound, Actions, [], QueryOutput) ->
modify_node(Bt, nil, Actions, QueryOutput);
modify_kpnode(_Bt, NodeTuple, LowerBound, [], ResultNode, QueryOutput) ->
--
To stop receiving notification emails like this one, please contact
"commits@couchdb.apache.org" <co...@couchdb.apache.org>.
[couchdb] 02/02: Apply the same optimization for prefix writes
Posted by da...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
davisp pushed a commit to branch COUCHDB-3298-optimize-writing-kv-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git
commit d9623fb6047d1fa8832710d171547020c153d355
Author: Paul J. Davis <pa...@gmail.com>
AuthorDate: Mon May 8 17:33:02 2017 -0500
Apply the same optimization for prefix writes
---
src/couch/src/couch_btree.erl | 53 +++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 24 deletions(-)
diff --git a/src/couch/src/couch_btree.erl b/src/couch/src/couch_btree.erl
index 1a4ad60..029771a 100644
--- a/src/couch/src/couch_btree.erl
+++ b/src/couch/src/couch_btree.erl
@@ -465,10 +465,17 @@ write_node(Bt, _OldNode, NodeType, [_], NewList) ->
% saves us the disk space that would have been
% orphaned by not reusing the old node.
write_node(Bt, OldNode, NodeType, OldList, NewList) ->
- case is_append_only(OldList, NewList) of
- false ->
- write_node(Bt, NodeType, NewList);
- {true, Suffix} ->
+ case is_extension(OldList, NewList) of
+ {prefix, Prefix} ->
+ case old_node_full(OldList) of
+ true ->
+ {ok, Results} = write_node(Bt, NodeType, Prefix),
+ {OldLastKey, _} = lists:last(OldList),
+ {ok, Results ++ [{OldLastKey, OldNode}]};
+ false ->
+ write_node(Bt, NodeType, NewList)
+ end;
+ {suffix, Suffix} ->
case old_node_full(OldList) of
true ->
{ok, Results} = write_node(Bt, NodeType, Suffix),
@@ -476,28 +483,26 @@ write_node(Bt, OldNode, NodeType, OldList, NewList) ->
{ok, [{OldLastKey,OldNode} | Results]};
false ->
write_node(Bt, NodeType, NewList)
- end
+ end;
+ false ->
+ write_node(Bt, NodeType, NewList)
end.
-% This function will blow up if OldList == NewList
-% on purpose as an assertion that modify_node
-% doesn't provide this input.
-%
-% First clause finds our suff
-is_append_only([], [_ | _] = Suffix) ->
- {true, Suffix};
-
-% We've removed keys from the node
-is_append_only([_ | _], []) ->
- false;
-
-% We've found a mismatch of keys in the node
-is_append_only([KV1 | _], [KV2 | _]) when KV1 /= KV2 ->
- false;
-
-% Current key is equal, keep going
-is_append_only([KV | Rest1], [KV | Rest2]) ->
- is_append_only(Rest1, Rest2).
+is_extension(OldList, NewList) ->
+ case lists:suffix(OldList, NewList) of
+ true ->
+ PrefixLength = length(NewList) - length(OldList),
+ {Prefix, _} = lists:split(PrefixLength, NewList),
+ {prefix, Prefix};
+ false ->
+ case lists:prefix(OldList, NewList) of
+ true ->
+ Suffix = lists:nthtail(length(OldList), NewList),
+ {suffix, Suffix};
+ false ->
+ false
+ end
+ end.
old_node_full(OldList) ->
ChunkThreshold = get_chunk_size(),
--
To stop receiving notification emails like this one, please contact
"commits@couchdb.apache.org" <co...@couchdb.apache.org>.