You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by da...@apache.org on 2017/05/03 17:57:14 UTC
[couchdb] branch COUCHDB-3298-optimize-writing-kv-nodes updated
(2ecf6a5 -> e3b5f40)
This is an automated email from the ASF dual-hosted git repository.
davisp pushed a change to branch COUCHDB-3298-optimize-writing-kv-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git.
discards 2ecf6a5 Opimize writing KV node append writes
discards 3bf8e4c Revert "Revert "Ensure multi-item chunks in couch_btree:chunkify/1""
discards 21f3f16 Revert "Ensure multi-item chunks in couch_btree:chunkify/1"
discards e8ba6f8 Revert "Make couch_btree:chunkify/1 prefer fewer chunks"
adds 80963ca Don't crash compactor when compacting process fails
adds 1c94522 Add function to clean up after failed compaction
adds 81166c5 Expose couch_index_compactor:get_compacting_pid/1
adds f7767a3 Update should_remove test
adds c3ff408 Merge pull request #474 from cloudant/3364-fix-view-compactor-unknown_info
adds 7dc5dba .gitignore: rel/couchdb
adds 07aca80 Add and document default.d/local.d directories
adds 7c3c66e Improve local.d README file
adds b6a951b fix dev/run script for new default.d/local.d dirs
adds 0eee733 really fix dev/run
adds bc9bc4d build: trivial typo
adds 02817b1 Windows equipvalent of apache/couchdb#448
adds 5713b30 Avoid creation of document if deleting attachment on non-existent doc - Check existence of document before deleting its attachment - if document doesn’t exist, return 404 instead of creating new document
adds 4aba3bc Merge pull request #486 from cloudant/COUCHDB-3362-delete-att-on-nonexistdoc
adds 2bc93b9 Make _local_docs conform to include_docs
adds 49fb01d Adopt all_docs_reduce_to_count/1 to _local docs
adds 9c5dc07 Fix total_row for _local_docs
adds 64573fc Add utests for _local_docs
adds 0d00c10 Merge pull request #488 from cloudant/3337-fix-_local_docs
adds a5e3deb Re-enable attachment replication tests
adds e5550fb Merge pull request #489 from cloudant/COUCHDB-3174-re-enable-attachment-replication-tests
adds e4c3705 Fix stale shards cache
adds b71677f Use a temporary process when caching shard maps
adds c1c6891 Add unit tests for mem3_shards
adds a3ec4fe Merge pull request #476 from apache/COUCHDB-3376-fix-mem3-shards
adds b05b172 New couchup 1.x -> 2.x database migration tool
adds 1111e60 Merge pull request #483 from apache/feat-couchup
adds 28dd801 bump documentation version
adds e1e1636 Revert "fix compiler and dialyzer warnings"
adds bcd718b Revert "Add sys_dbs to the LRU"
adds 8d888d7 Adjust reverted code to new couch_lru API
adds 350a67b Merge pull request #490 from cloudant/revert-Add-sys_dbs-to-LRU
adds 9718b97 Introduce couch_replicator_scheduler
adds 9895a73 Cluster ownership module implementation
adds 4d2969d Implement multi-db shard change monitoring
adds 2505436 Share connections between replications
adds d3d9097 AIMD based rate limiter implementation
adds d89f21b Refactor utils into 3 modules
adds dcfa090 Implement replication document processor
adds 4841774 Stitch scheduling replicator together.
adds 6df8cf6 Add `_scheduler/{jobs,docs}` API endpoints
adds f7a711d Merge pull request #470 from apache/63012-scheduler
adds 27d1223 Disabling replication startup jitter in Windows makefile
adds c40b232 Merge pull request #500 from cloudant/couchdb-3324-windows-makefile-fix
adds 8e83c42 snap --> couchdb-pkg repo
adds fa2bcb5 bump for next release
adds 2689507 Fix error on race condition in mem3 startup
adds 879e7eb bump version.mk
adds d1b16e2 Revert "Fix error on race condition in mem3 startup"
adds cc5a552 build: pull authors out of subrepos
adds 4e7e9ee Merge pull request #312 from robertkowalski/build-thanks
adds 33a33c1 Update rebar with new fauxton tag
adds 63278f2 Bump docs to include scheduling replicator documentation
adds 81ee7c5 Fix error on race condition in mem3 startup
adds 235bd06 Fix and re-enable many test cases
adds ecb67e3 bypass compact.js flaky comparison
adds f9e2e5a Fix `badarg` when querying replicator's _scheduler/docs endpoint
adds 4e983fc Merge pull request #503 from cloudant/couchdb-3324-fix-badarg
new e3b5f40 Opimize writing KV node append writes
This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version. This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:
* -- * -- B -- O -- O -- O (2ecf6a5)
\
N -- N -- N refs/heads/COUCHDB-3298-optimize-writing-kv-nodes (e3b5f40)
You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.
Any revisions marked "omits" are not gone; other references still
refer to them. Any revisions marked "discards" are gone forever.
The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "adds" were already present in the repository and have only
been added to this reference.
Summary of changes:
.gitignore | 1 +
Makefile | 19 +-
Makefile.win | 2 +-
build-aux/print-committerlist.sh | 68 +
configure | 2 +-
dev/run | 17 +
rebar.config.script | 4 +-
rel/files/eunit.ini | 3 +
rel/overlay/bin/couchdb.cmd | 2 +-
rel/overlay/bin/couchup | 480 +++++++
rel/overlay/etc/default.d/README | 11 +
rel/overlay/etc/default.ini | 14 +-
rel/overlay/etc/local.d/README | 8 +
rel/reltool.config | 2 +-
rel/snap.ini | 10 -
rel/snap_run | 9 -
rel/snapcraft.yaml | 47 -
src/chttpd/src/chttpd_db.erl | 4 +
src/chttpd/src/chttpd_httpd_handlers.erl | 1 +
src/chttpd/test/chttpd_db_test.erl | 84 +-
src/couch/src/couch_btree.erl | 6 +-
src/couch/src/couch_db_updater.erl | 7 +-
src/couch/src/couch_file.erl | 22 +-
src/couch/src/couch_httpd_db.erl | 4 +
src/couch/src/couch_multidb_changes.erl | 860 ++++++++++++
src/couch/src/couch_server.erl | 93 +-
src/couch/test/couchdb_views_tests.erl | 2 +
src/couch_index/src/couch_index.erl | 12 +-
src/couch_index/src/couch_index_compactor.erl | 14 +-
src/couch_mrview/src/couch_mrview.erl | 26 +-
src/couch_mrview/src/couch_mrview_compactor.erl | 9 +-
src/couch_mrview/src/couch_mrview_index.erl | 6 +-
src/couch_mrview/src/couch_mrview_test_util.erl | 16 +-
src/couch_mrview/src/couch_mrview_util.erl | 13 +-
.../test/couch_mrview_compact_tests.erl | 31 +-
.../test/couch_mrview_local_docs_tests.erl | 132 ++
src/couch_replicator/README.md | 292 ++++
src/couch_replicator/priv/stats_descriptions.cfg | 96 ++
src/couch_replicator/src/couch_replicator.app.src | 26 +-
src/couch_replicator/src/couch_replicator.erl | 1219 ++++-------------
src/couch_replicator/src/couch_replicator.hrl | 35 +-
.../src/couch_replicator_api_wrap.erl | 50 +-
.../src/couch_replicator_api_wrap.hrl | 3 +-
.../src/couch_replicator_clustering.erl | 243 ++++
.../src/couch_replicator_connection.erl | 237 ++++
.../src/couch_replicator_db_changes.erl | 108 ++
.../src/couch_replicator_doc_processor.erl | 973 +++++++++++++
.../src/couch_replicator_doc_processor_worker.erl | 284 ++++
src/couch_replicator/src/couch_replicator_docs.erl | 756 ++++++++++
.../src/couch_replicator_fabric.erl | 155 +++
.../src/couch_replicator_fabric_rpc.erl | 97 ++
.../src/couch_replicator_filters.erl | 214 +++
.../src/couch_replicator_httpc.erl | 123 +-
.../src/couch_replicator_httpc_pool.erl | 79 +-
.../src/couch_replicator_httpd.erl | 77 +-
.../src/couch_replicator_httpd_util.erl | 201 +++
src/couch_replicator/src/couch_replicator_ids.erl | 127 ++
.../src/couch_replicator_job_sup.erl | 7 +-
.../src/couch_replicator_js_functions.hrl | 8 +-
.../src/couch_replicator_manager.erl | 1034 +-------------
.../src/couch_replicator_rate_limiter.erl | 262 ++++
.../src/couch_replicator_rate_limiter_tables.erl | 62 +
.../src/couch_replicator_scheduler.erl | 1446 ++++++++++++++++++++
.../src/couch_replicator_scheduler.hrl | 4 +-
...ator.erl => couch_replicator_scheduler_job.erl} | 530 ++++---
.../src/couch_replicator_scheduler_sup.erl | 62 +
src/couch_replicator/src/couch_replicator_sup.erl | 54 +-
.../src/couch_replicator_utils.erl | 583 ++------
.../src/couch_replicator_worker.erl | 3 +
.../test/couch_replicator_compact_tests.erl | 30 +-
.../test/couch_replicator_connection_tests.erl | 241 ++++
.../test/couch_replicator_httpc_pool_tests.erl | 2 +-
.../test/couch_replicator_many_leaves_tests.erl | 24 +-
.../test/couch_replicator_modules_load_tests.erl | 11 +-
.../test/couch_replicator_proxy_tests.erl | 69 +
.../test/couch_replicator_rate_limiter_tests.erl | 89 ++
...ch_replicator_small_max_request_size_target.erl | 7 +-
.../test/couch_replicator_test_helper.erl | 22 +-
.../couch_replicator_use_checkpoints_tests.erl | 24 +-
src/mem3/src/mem3_shards.erl | 340 ++++-
src/mem3/src/mem3_util.erl | 7 +-
test/javascript/couch.js | 4 +-
test/javascript/tests/auth_cache.js | 42 +-
test/javascript/tests/compact.js | 4 +-
test/javascript/tests/config.js | 58 +-
test/javascript/tests/delayed_commits.js | 124 +-
test/javascript/tests/erlang_views.js | 8 +-
test/javascript/tests/oauth_users_db.js | 2 +-
test/javascript/tests/proxyauth.js | 26 +-
test/javascript/tests/replicator_db_bad_rep_id.js | 3 +-
test/javascript/tests/rev_stemming.js | 16 +-
test/javascript/tests/rewrite.js | 9 +-
test/javascript/tests/rewrite_js.js | 1 -
test/javascript/tests/stats.js | 86 +-
test/javascript/tests/view_collation_raw.js | 1 -
test/javascript/tests/view_compaction.js | 6 +-
version.mk | 2 +-
97 files changed, 9391 insertions(+), 3358 deletions(-)
create mode 100755 build-aux/print-committerlist.sh
create mode 100755 rel/overlay/bin/couchup
create mode 100644 rel/overlay/etc/default.d/README
create mode 100644 rel/overlay/etc/local.d/README
delete mode 100644 rel/snap.ini
delete mode 100755 rel/snap_run
delete mode 100644 rel/snapcraft.yaml
create mode 100644 src/couch/src/couch_multidb_changes.erl
create mode 100644 src/couch_mrview/test/couch_mrview_local_docs_tests.erl
create mode 100644 src/couch_replicator/README.md
create mode 100644 src/couch_replicator/src/couch_replicator_clustering.erl
create mode 100644 src/couch_replicator/src/couch_replicator_connection.erl
create mode 100644 src/couch_replicator/src/couch_replicator_db_changes.erl
create mode 100644 src/couch_replicator/src/couch_replicator_doc_processor.erl
create mode 100644 src/couch_replicator/src/couch_replicator_doc_processor_worker.erl
create mode 100644 src/couch_replicator/src/couch_replicator_docs.erl
create mode 100644 src/couch_replicator/src/couch_replicator_fabric.erl
create mode 100644 src/couch_replicator/src/couch_replicator_fabric_rpc.erl
create mode 100644 src/couch_replicator/src/couch_replicator_filters.erl
create mode 100644 src/couch_replicator/src/couch_replicator_httpd_util.erl
create mode 100644 src/couch_replicator/src/couch_replicator_ids.erl
create mode 100644 src/couch_replicator/src/couch_replicator_rate_limiter.erl
create mode 100644 src/couch_replicator/src/couch_replicator_rate_limiter_tables.erl
create mode 100644 src/couch_replicator/src/couch_replicator_scheduler.erl
copy rel/files/sys.config => src/couch_replicator/src/couch_replicator_scheduler.hrl (90%)
copy src/couch_replicator/src/{couch_replicator.erl => couch_replicator_scheduler_job.erl} (80%)
create mode 100644 src/couch_replicator/src/couch_replicator_scheduler_sup.erl
create mode 100644 src/couch_replicator/test/couch_replicator_connection_tests.erl
create mode 100644 src/couch_replicator/test/couch_replicator_proxy_tests.erl
create mode 100644 src/couch_replicator/test/couch_replicator_rate_limiter_tests.erl
--
To stop receiving notification emails like this one, please contact
['"commits@couchdb.apache.org" <co...@couchdb.apache.org>'].
[couchdb] 01/01: Opimize writing KV node append writes
Posted by da...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
davisp pushed a commit to branch COUCHDB-3298-optimize-writing-kv-nodes
in repository https://gitbox.apache.org/repos/asf/couchdb.git
commit e3b5f40d9130a6d347b02f31a74fd5300b41bc6e
Author: Paul J. Davis <pa...@gmail.com>
AuthorDate: Wed May 3 12:27:08 2017 -0500
Opimize writing KV node append writes
As it turns out, the original change in COUCHDB-3298 ends up hurting
disk usage when a view emits large amounts of data (i.e., more than
half of the btree chunk size). The cause for this is that instead of
writing single element nodes it would instead prefer to write kv nodes
with three elements. While normally we might prefer this in memory, it
turns out that our append only storage this causes a significantly more
amount of trash on disk.
We can show this with a few trivial examples. Imagine we write KV's a
through f. The two following patterns show the nodes as we write each
new kv.
Before 3298:
[]
[a]
[a, b]
[a, b]', [c]
[a, b]', [c, d]
[a, b]', [c, d]', [e]
[a, b]', [c, d]', [e, f]
After 3298:
[]
[a]
[a, b]
[a, b, c]
[a, b]', [c, d]
[a, b]', [c, d, e]
[a, b]', [c, d]', [e, f]
The thing to realize here is which of these nodes end up as garbage. In
the first example we end up with [a], [a, b], [c], [c, d], and [e] nodes
that have been orphaned. Where as in the second case we end up with
[a], [a, b], [a, b, c], [c, d], [c, d, e] as nodes that have been
orphaned. A quick aside, the reason that [a, b] and [c, d] are orphaned
is due to how a btree update works. For instance, when adding c, we read
[a, b] into memory, append c, and then during our node write we call
chunkify which gives us back [a, b], [c] which leads us to writing [a,
b] a second time.
This patch changes the write function to realize when we're merely
appending KVs and saves us this extra write and generation of garbage.
Its node patterns look like such:
[]
[a]
[a, b]
[a, b], [c]
[a, b], [c, d]
[a, b], [c, d], [e]
[a, b], [c, d], [e, f]
Which means we only end up generating [a], [c], and [e] as garbage (with
respect to kv nodes, kp nodes retain their historical behavior).
---
src/couch/src/couch_btree.erl | 58 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)
diff --git a/src/couch/src/couch_btree.erl b/src/couch/src/couch_btree.erl
index adbc92b..492e53b 100644
--- a/src/couch/src/couch_btree.erl
+++ b/src/couch/src/couch_btree.erl
@@ -396,7 +396,8 @@ modify_node(Bt, RootPointerInfo, Actions, QueryOutput) ->
{LastKey, _LastValue} = element(tuple_size(NodeTuple), NodeTuple),
{ok, [{LastKey, RootPointerInfo}], QueryOutput2};
_Else2 ->
- {ok, ResultList} = write_node(Bt, NodeType, NewNodeList),
+ {ok, ResultList} = write_node(
+ Bt, RootPointerInfo, NodeType, NodeList, NewNodeList),
{ok, ResultList, QueryOutput2}
end.
@@ -440,6 +441,61 @@ write_node(#btree{fd = Fd, compression = Comp} = Bt, NodeType, NodeList) ->
],
{ok, ResultList}.
+% Don't make our append-only write optimization for
+% kp nodes.
+write_node(Bt, _OldNode, kp_node, _OldList, NewList) ->
+ write_node(Bt, kp_node, NewList);
+
+% If we're creating a new kv node then there's no
+% possibility for the optimization
+write_node(Bt, _OldNode, NodeType, [], NewList) ->
+ write_node(Bt, NodeType, NewList);
+
+% Disable the optimization for nodes that only
+% have a single element so we don't end up increasing
+% the number of reads when folding a btree
+write_node(Bt, _OldNode, NodeType, [_], NewList) ->
+ write_node(Bt, NodeType, NewList);
+
+% If a KV node has had a new key appended to the
+% end of its list we can instead take the appended
+% KVs and create a new node while reusing the old
+% node already on disk. This saves us both the effort
+% of writing data that's already on disk as well as
+% saves us the disk space that would have been
+% orphaned by not reusing the old node.
+write_node(Bt, OldNode, NodeType, OldList, NewList) ->
+ case is_append_only(OldList, NewList) of
+ false ->
+ write_node(Bt, NodeType, NewList);
+ {true, Suffix} ->
+ case old_node_full(OldList) of
+ true ->
+ {ok, Results} = write_node(Bt, NodeType, Suffix),
+ {OldLastKey, _} = lists:last(OldList),
+ {ok, [{OldLastKey,OldNode} | Results]};
+ false ->
+ write_node(Bt, NodeType, NewList)
+ end
+ end.
+
+% This function will blow up if OldList == NewList
+% on purpose as an assertion that modify_node
+% doesn't provide this input.
+is_append_only([], [_ | _] = Suffix) ->
+ {true, Suffix};
+is_append_only([KV1 | _], [KV2 | _]) when KV1 /= KV2 ->
+ false;
+is_append_only([KV | Rest1], [KV | Rest2]) ->
+ is_append_only(Rest1, Rest2).
+
+old_node_full(OldList) ->
+ ChunkThreshold = get_chunk_size(),
+ NodeSize = lists:foldl(fun(KV, Acc) ->
+ Acc + ?term_size(KV)
+ end, 0, OldList),
+ NodeSize >= ChunkThreshold.
+
modify_kpnode(Bt, {}, _LowerBound, Actions, [], QueryOutput) ->
modify_node(Bt, nil, Actions, QueryOutput);
modify_kpnode(_Bt, NodeTuple, LowerBound, [], ResultNode, QueryOutput) ->
--
To stop receiving notification emails like this one, please contact
"commits@couchdb.apache.org" <co...@couchdb.apache.org>.