You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/07/21 11:56:39 UTC

[GitHub] [couchdb] garrensmith opened a new pull request #3018: Fdb b+tree reduce

garrensmith opened a new pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018


   <!-- Thank you for your contribution!
   
        Please file this form by replacing the Markdown comments
        with your text. If a section needs no action - remove it.
   
        Also remember, that CouchDB uses the Review-Then-Commit (RTC) model
        of code collaboration. Positive feedback is represented +1 from committers
        and negative is a -1. The -1 also means veto, and needs to be addressed
        to proceed. Once there are no objections, the PR can be merged by a
        CouchDB committer.
   
        See: http://couchdb.apache.org/bylaws.html#decisions for more info. -->
   
   ## Overview
   
   Reduce on FDB using ebtree. This is a new reduce implementation that uses the ebtree to do builtin reduce. 
   This requires that #3017 is merged first. 
   
   ## Testing recommendations
   
   <!-- Describe how we can test your changes.
        Does it provides any behaviour that the end users
        could notice? -->
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affects multiple components in different
        repositories please put links to those issues or pull requests here.  -->
   
   ## Checklist
   
   - [x] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in `rel/overlay/etc/default.ini`
   - [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
garrensmith commented on pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018#issuecomment-662925449


   Thanks for taking a look @rnewson 
   
   > your 1 and 2 strike me as the wrong approach though I think I see your reasoning. The first items claim to "store the reduce results" is misleading to the casual reader as, without something like ebtree and its storage of data on inner nodes, it's not possible to calculate useful intermediation reductions. What I think you're doing is reducing the k-v's emitted by a single document? If so, that is not something that CouchDB has done to date and seems to have limited value, certainly in the common case that a map function emits one row per document.
   
   > A simple design that uses less space over all would be to insert the emitted keys and values directly into ebtree and pass in a reducer function that calculates each of the desired reductions specified, and store those in a tuple or a map. couch_views can then call ebtree's functions for lookup (?key=), ranges (for ?startkey=X&endkey=Y&reduce=false) and the various reduce functions as needed (group=true, group_level=X, reduce=true). This ensures we only store each distinct thing once and the logic gets much simpler.
   
   I'm currently doing this, if a document emits the following from a map function:
   ```
   ([1, 1], 1)
   ([1, 1], 2)
   ([3, 3], 1)
   ```
   
   Then the reduce results for a `_sum` would be:
   ```
   ([1, 1], 3)
   ([3, 3], 1)
   ```
   
   I would then store those values like this in ebtree
   
   ```
   (([1, 1], doc_id), (KVSize, 3))
   (([3, 3], doc_id), (KVSize, 1))
   ```
   
   I know this is different to how CouchDB had done in the previously. But I don't see what we would gain from storing the map index a second time. Querying the current map index should be faster than reading from the b-tree since we can do a range scan of the map index. I'm really not comfortable using ebtree for the map index. We would need to determine how performant it would be to do that and what advantages we would get. I would really like to keep ebtree just for the reduce part. I'm very cautious on the idea of using it outside of that. I think if we start loosing a lot of the functionality we get with FDB. 
   
   >Finally, on insertion generally, if it's possible to do even a small amount of batching we'll reap considerable performance rewards. For the case where we update the view atomically with the document, obviously that can't happen, but for new indexes it would be good if we would update 10 or 20 documents per fdb txn that involves an ebtree. Insert performance would be approximately 10x / 20x faster.
   
   This is already done in `couch_views_indexer`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith closed pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
garrensmith closed pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
garrensmith commented on pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018#issuecomment-786028134


   No longer needed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
garrensmith commented on pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018#issuecomment-662866086


   This PR is ready for a first review. Some notes about the design:
   
   1. I don't store the map values in the b-tree. I only store the reduce results. That means the leaf nodes contain the size of the reduce values and the reduce values. I didn't want do store duplicates of the map k/v in the b-tree that doesn't make any sense to me. 
   
   2. I have a separate b-tree for each reduce function. So if we have a map function that has multiple reduce functions each reduce is in its own b-tree. I've done this to limit the size of the k/v's and nodes in the b-tree so that we are less likely to exceed any of FDB's k/v size limits when we have a higher order for a b-tree. 
   
   3. I don't have very good tests yet for the size calculation. I'm still trying to decide the best approach for that. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018#discussion_r459267563



##########
File path: src/couch_views/src/couch_views_reduce_fdb.erl
##########
@@ -0,0 +1,238 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_reduce_fdb).
+
+
+-export([
+    write_doc/5,
+    idx_prefix/3,
+    get_kv_size/3,
+    fold/8
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+write_doc(TxDb, Sig, Views, #{deleted := true} = Doc, ExistingViewKeys) ->
+    #{
+        id := DocId
+    } = Doc,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun(View) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+        lists:foreach(fun(ViewReduceFun) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+            delete_keys(Tx, Tree, DocId, ExistingKeys)
+
+        end, ViewReduceFuns)
+
+    end, Views);
+
+write_doc(TxDb, Sig, Views, #{reduce_results := ViewReduceResults, id := DocId},
+    ExistingViewKeys) ->
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun({View, ReduceResults}) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+
+        lists:foreach(fun({ViewReduceFun, ReduceResult}) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+
+            delete_keys(Tx, Tree, DocId, ExistingKeys),
+            add_keys(Tx, Tree, DocId, ReduceResult)
+
+        end, lists:zip(ViewReduceFuns, ReduceResults))
+    end, lists:zip(Views, ViewReduceResults));
+
+write_doc(_TxDb, _Sig, _Views, _Doc, _ExistingViewKeys) ->
+    ok.
+
+
+idx_prefix(DbPrefix, Sig, ViewId) ->
+    Key = {?DB_VIEWS, ?VIEW_DATA, Sig, ?VIEW_REDUCE_RANGE, ViewId},
+    erlfdb_tuple:pack(Key, DbPrefix).
+
+
+get_kv_size(TxDb, Mrst, ViewId) ->
+    #mrst {
+        views = Views,
+        sig = Sig
+    } = Mrst,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    [View] = lists:filter(fun(View) -> View#mrview.id_num == ViewId end, Views),
+    #mrview{
+        reduce_funs = ViewReduceFuns,
+        id_num = ViewId
+    } = View,
+
+    lists:foldl(fun(ReduceFun, Acc) ->
+        Tree = open_tree(TxDb, Sig, ViewId, ReduceFun),
+        {_, KVSize} = ebtree:full_reduce(Tx, Tree),
+        Acc + KVSize
+    end, 0, ViewReduceFuns).
+
+
+fold(Db, Sig, ViewId, Reducer, GroupLevel, Opts, UserCallback,
+    UserAcc0) ->
+
+    Acc0 = #{
+        user_callback => UserCallback,
+        user_acc => UserAcc0,
+        reducer => Reducer
+    },
+
+    fabric2_fdb:transactional(Db, fun(TxDb) ->
+        #{
+            tx := Tx
+        } = TxDb,
+
+        StartKey = fabric2_util:get_value(start_key, Opts),
+        EndKey = fabric2_util:get_value(end_key, Opts),
+
+        Tree = open_tree(TxDb, Sig, ViewId, Reducer),
+        GroupKeyFun = fun({Key, _DocId}) ->
+            couch_views_util:group_level_key(Key, GroupLevel)
+        end,
+
+        Acc1 = ebtree:group_reduce(Tx, Tree, StartKey, EndKey, GroupKeyFun,
+            fun fold_cb/2, Acc0, Opts),
+
+        maps:get(user_acc, Acc1)
+    end).
+
+
+get_existing_keys(ViewId, ExistingViewKeys) ->
+    case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+        {ViewId, _TotalRows, _TotalSize, EKeys} ->
+            EKeys;
+        false ->
+            []
+    end.
+
+
+% The reduce values are stored as keys in the b-tree
+% So every call to the reducer from the b-tree is always
+% a rereduce call.
+get_reducer({_, ReduceFun}) ->
+    get_reducer(ReduceFun);
+
+get_reducer(Reducer) ->
+    fun
+        (Vs, true) ->
+            rereduce_val_and_size(Reducer, Vs);
+        (KVs, false) ->
+            {_, Vs} = lists:unzip(KVs),
+            rereduce_val_and_size(Reducer, Vs)
+    end.
+
+
+% This happens if a reduce is called without the index being built
+% for example from `get_kv_size`
+rereduce_val_and_size(Reducer, []) ->
+    {0, 0};
+
+rereduce_val_and_size(Reducer, Vs) ->
+    {ReduceVs, SizeVs} = lists:unzip(Vs),
+    ReduceVal = couch_views_reducer:rereduce_values(Reducer, ReduceVs),
+    SizeVal = lists:sum(SizeVs),
+    {ReduceVal, SizeVal}.
+
+
+open_tree(TxDb, Sig, ViewId, ViewReduceFun) ->
+    #{
+        db_prefix := DbPrefix,
+        tx := Tx
+    } = TxDb,
+
+    ReduceId = couch_views_util:reduce_id(ViewId, ViewReduceFun),
+    ReduceIdxPrefix = idx_prefix(DbPrefix, Sig, ReduceId),
+
+    TreeOpts = [
+        {reduce_fun, get_reducer(ViewReduceFun)}
+    ],
+    ebtree:open(Tx, ReduceIdxPrefix, btree_order_size(), TreeOpts).
+
+
+delete_keys(Tx, Tree, DocId, Keys) ->
+    lists:foreach(fun (Key) ->
+        EK = create_key(Key, DocId),
+        ebtree:delete(Tx, Tree, EK)
+    end, Keys).
+
+
+add_keys(Tx, Tree, DocId, Results) ->
+    lists:foreach(fun ({Key, Val}) ->
+        EK = create_key(Key, DocId),
+        EV = create_val(Key, Val),
+        ebtree:insert(Tx, Tree, EK, EV)
+    end, Results).
+
+
+create_key(Key, DocId) ->
+    {Key, DocId}.
+
+
+create_val(Key, Val) ->
+    KeySize = erlang:external_size(Key),

Review comment:
       I've followed how the sizing is calculated in `couch_views_indexer`. There is a note there to change to `couch_ejson_size:encoded_size/`. So we should probably change both in a separate PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] rnewson commented on a change in pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
rnewson commented on a change in pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018#discussion_r459284232



##########
File path: src/couch_views/src/couch_views_reduce_fdb.erl
##########
@@ -0,0 +1,248 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_reduce_fdb).
+
+
+-export([
+    write_doc/5,
+    idx_prefix/3,
+    get_kv_size/3,
+    fold/8
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+write_doc(TxDb, Sig, Views, #{deleted := true} = Doc, ExistingViewKeys) ->
+    #{
+        id := DocId
+    } = Doc,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun(View) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+        lists:foreach(fun(ViewReduceFun) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+            delete_keys(Tx, Tree, DocId, ExistingKeys)
+
+        end, ViewReduceFuns)
+
+    end, Views);
+
+write_doc(TxDb, Sig, Views, #{reduce_results := ViewReduceResults, id := DocId},
+    ExistingViewKeys) ->
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun({View, ReduceResults}) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+
+        lists:foreach(fun({ViewReduceFun, ReduceResult}) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+
+            delete_keys(Tx, Tree, DocId, ExistingKeys),
+            add_keys(Tx, Tree, DocId, ReduceResult)
+
+        end, lists:zip(ViewReduceFuns, ReduceResults))
+    end, lists:zip(Views, ViewReduceResults));
+
+write_doc(_TxDb, _Sig, _Views, _Doc, _ExistingViewKeys) ->
+    ok.
+
+
+idx_prefix(DbPrefix, Sig, ViewId) ->
+    Key = {?DB_VIEWS, ?VIEW_DATA, Sig, ?VIEW_REDUCE_RANGE, ViewId},
+    erlfdb_tuple:pack(Key, DbPrefix).
+
+
+get_kv_size(TxDb, Mrst, ViewId) ->
+    #mrst {
+        views = Views,
+        sig = Sig
+    } = Mrst,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    [View] = lists:filter(fun(View) -> View#mrview.id_num == ViewId end, Views),
+    #mrview{
+        reduce_funs = ViewReduceFuns,
+        id_num = ViewId
+    } = View,
+
+    lists:foldl(fun(ReduceFun, Acc) ->
+        Tree = open_tree(TxDb, Sig, ViewId, ReduceFun),
+        {_, KVSize} = ebtree:full_reduce(Tx, Tree),
+        Acc + KVSize
+    end, 0, ViewReduceFuns).
+
+
+fold(Db, Sig, ViewId, Reducer, GroupLevel, Opts, UserCallback,
+    UserAcc0) ->
+
+    Acc0 = #{
+        user_callback => UserCallback,
+        user_acc => UserAcc0,
+        reducer => Reducer
+    },
+
+    fabric2_fdb:transactional(Db, fun(TxDb) ->
+        #{
+            tx := Tx
+        } = TxDb,
+
+        StartKey = fabric2_util:get_value(start_key, Opts),
+        EndKey = fabric2_util:get_value(end_key, Opts),
+
+        Tree = open_tree(TxDb, Sig, ViewId, Reducer),
+        GroupKeyFun = fun({Key, _DocId}) ->
+            couch_views_util:group_level_key(Key, GroupLevel)
+        end,
+
+        Acc1 = ebtree:group_reduce(Tx, Tree, StartKey, EndKey, GroupKeyFun,
+            fun fold_cb/2, Acc0, Opts),
+
+        maps:get(user_acc, Acc1)
+    end).
+
+
+get_existing_keys(ViewId, ExistingViewKeys) ->
+    case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+        {ViewId, _TotalRows, _TotalSize, EKeys} ->
+            EKeys;
+        false ->
+            []
+    end.
+
+
+% The reduce values are stored as keys in the b-tree

Review comment:
       Can you clarify here? I _think_ the reduce value you've calculated outside of ebtree is the reduction over the k-v's emitted by a single document?

##########
File path: src/couch_views/src/couch_views_reduce_fdb.erl
##########
@@ -0,0 +1,248 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_reduce_fdb).
+
+
+-export([
+    write_doc/5,
+    idx_prefix/3,
+    get_kv_size/3,
+    fold/8
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+write_doc(TxDb, Sig, Views, #{deleted := true} = Doc, ExistingViewKeys) ->
+    #{
+        id := DocId
+    } = Doc,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun(View) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+        lists:foreach(fun(ViewReduceFun) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+            delete_keys(Tx, Tree, DocId, ExistingKeys)
+
+        end, ViewReduceFuns)
+
+    end, Views);
+
+write_doc(TxDb, Sig, Views, #{reduce_results := ViewReduceResults, id := DocId},
+    ExistingViewKeys) ->
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun({View, ReduceResults}) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+
+        lists:foreach(fun({ViewReduceFun, ReduceResult}) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+
+            delete_keys(Tx, Tree, DocId, ExistingKeys),
+            add_keys(Tx, Tree, DocId, ReduceResult)
+
+        end, lists:zip(ViewReduceFuns, ReduceResults))
+    end, lists:zip(Views, ViewReduceResults));
+
+write_doc(_TxDb, _Sig, _Views, _Doc, _ExistingViewKeys) ->
+    ok.
+
+
+idx_prefix(DbPrefix, Sig, ViewId) ->
+    Key = {?DB_VIEWS, ?VIEW_DATA, Sig, ?VIEW_REDUCE_RANGE, ViewId},
+    erlfdb_tuple:pack(Key, DbPrefix).
+
+
+get_kv_size(TxDb, Mrst, ViewId) ->
+    #mrst {
+        views = Views,
+        sig = Sig
+    } = Mrst,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    [View] = lists:filter(fun(View) -> View#mrview.id_num == ViewId end, Views),

Review comment:
       `View = lists:keyfind(ViewId, #mrview.id_num, Views),` is the idiomatic way to do this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] rnewson commented on a change in pull request #3018: Fdb b+tree reduce

Posted by GitBox <gi...@apache.org>.
rnewson commented on a change in pull request #3018:
URL: https://github.com/apache/couchdb/pull/3018#discussion_r458998415



##########
File path: src/couch_views/src/couch_views_reduce_fdb.erl
##########
@@ -0,0 +1,238 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_reduce_fdb).
+
+
+-export([
+    write_doc/5,
+    idx_prefix/3,
+    get_kv_size/3,
+    fold/8
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+write_doc(TxDb, Sig, Views, #{deleted := true} = Doc, ExistingViewKeys) ->
+    #{
+        id := DocId
+    } = Doc,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun(View) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+        lists:foreach(fun(ViewReduceFun) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+            delete_keys(Tx, Tree, DocId, ExistingKeys)
+
+        end, ViewReduceFuns)
+
+    end, Views);
+
+write_doc(TxDb, Sig, Views, #{reduce_results := ViewReduceResults, id := DocId},
+    ExistingViewKeys) ->
+    #{
+        tx := Tx
+    } = TxDb,
+
+    lists:foreach(fun({View, ReduceResults}) ->
+        #mrview{
+            reduce_funs = ViewReduceFuns,
+            id_num = ViewId
+        } = View,
+
+        ExistingKeys = get_existing_keys(ViewId, ExistingViewKeys),
+
+        lists:foreach(fun({ViewReduceFun, ReduceResult}) ->
+            Tree = open_tree(TxDb, Sig, ViewId, ViewReduceFun),
+
+            delete_keys(Tx, Tree, DocId, ExistingKeys),
+            add_keys(Tx, Tree, DocId, ReduceResult)
+
+        end, lists:zip(ViewReduceFuns, ReduceResults))
+    end, lists:zip(Views, ViewReduceResults));
+
+write_doc(_TxDb, _Sig, _Views, _Doc, _ExistingViewKeys) ->
+    ok.
+
+
+idx_prefix(DbPrefix, Sig, ViewId) ->
+    Key = {?DB_VIEWS, ?VIEW_DATA, Sig, ?VIEW_REDUCE_RANGE, ViewId},
+    erlfdb_tuple:pack(Key, DbPrefix).
+
+
+get_kv_size(TxDb, Mrst, ViewId) ->
+    #mrst {
+        views = Views,
+        sig = Sig
+    } = Mrst,
+
+    #{
+        tx := Tx
+    } = TxDb,
+
+    [View] = lists:filter(fun(View) -> View#mrview.id_num == ViewId end, Views),
+    #mrview{
+        reduce_funs = ViewReduceFuns,
+        id_num = ViewId
+    } = View,
+
+    lists:foldl(fun(ReduceFun, Acc) ->
+        Tree = open_tree(TxDb, Sig, ViewId, ReduceFun),
+        {_, KVSize} = ebtree:full_reduce(Tx, Tree),
+        Acc + KVSize
+    end, 0, ViewReduceFuns).
+
+
+fold(Db, Sig, ViewId, Reducer, GroupLevel, Opts, UserCallback,
+    UserAcc0) ->
+
+    Acc0 = #{
+        user_callback => UserCallback,
+        user_acc => UserAcc0,
+        reducer => Reducer
+    },
+
+    fabric2_fdb:transactional(Db, fun(TxDb) ->
+        #{
+            tx := Tx
+        } = TxDb,
+
+        StartKey = fabric2_util:get_value(start_key, Opts),
+        EndKey = fabric2_util:get_value(end_key, Opts),
+
+        Tree = open_tree(TxDb, Sig, ViewId, Reducer),
+        GroupKeyFun = fun({Key, _DocId}) ->
+            couch_views_util:group_level_key(Key, GroupLevel)
+        end,
+
+        Acc1 = ebtree:group_reduce(Tx, Tree, StartKey, EndKey, GroupKeyFun,
+            fun fold_cb/2, Acc0, Opts),
+
+        maps:get(user_acc, Acc1)
+    end).
+
+
+get_existing_keys(ViewId, ExistingViewKeys) ->
+    case lists:keyfind(ViewId, 1, ExistingViewKeys) of
+        {ViewId, _TotalRows, _TotalSize, EKeys} ->
+            EKeys;
+        false ->
+            []
+    end.
+
+
+% The reduce values are stored as keys in the b-tree
+% So every call to the reducer from the b-tree is always
+% a rereduce call.
+get_reducer({_, ReduceFun}) ->
+    get_reducer(ReduceFun);
+
+get_reducer(Reducer) ->
+    fun
+        (Vs, true) ->
+            rereduce_val_and_size(Reducer, Vs);
+        (KVs, false) ->
+            {_, Vs} = lists:unzip(KVs),
+            rereduce_val_and_size(Reducer, Vs)
+    end.
+
+
+% This happens if a reduce is called without the index being built
+% for example from `get_kv_size`
+rereduce_val_and_size(Reducer, []) ->
+    {0, 0};
+
+rereduce_val_and_size(Reducer, Vs) ->
+    {ReduceVs, SizeVs} = lists:unzip(Vs),
+    ReduceVal = couch_views_reducer:rereduce_values(Reducer, ReduceVs),
+    SizeVal = lists:sum(SizeVs),
+    {ReduceVal, SizeVal}.
+
+
+open_tree(TxDb, Sig, ViewId, ViewReduceFun) ->
+    #{
+        db_prefix := DbPrefix,
+        tx := Tx
+    } = TxDb,
+
+    ReduceId = couch_views_util:reduce_id(ViewId, ViewReduceFun),
+    ReduceIdxPrefix = idx_prefix(DbPrefix, Sig, ReduceId),
+
+    TreeOpts = [
+        {reduce_fun, get_reducer(ViewReduceFun)}
+    ],
+    ebtree:open(Tx, ReduceIdxPrefix, btree_order_size(), TreeOpts).
+
+
+delete_keys(Tx, Tree, DocId, Keys) ->
+    lists:foreach(fun (Key) ->
+        EK = create_key(Key, DocId),
+        ebtree:delete(Tx, Tree, EK)
+    end, Keys).
+
+
+add_keys(Tx, Tree, DocId, Results) ->
+    lists:foreach(fun ({Key, Val}) ->
+        EK = create_key(Key, DocId),
+        EV = create_val(Key, Val),
+        ebtree:insert(Tx, Tree, EK, EV)
+    end, Results).
+
+
+create_key(Key, DocId) ->
+    {Key, DocId}.
+
+
+create_val(Key, Val) ->
+    KeySize = erlang:external_size(Key),

Review comment:
       if this is intended to be the billing size, it should be the `couch_ejson_size:encoded_size/1` function like currently. check with eric who changed the external size away from external_size the last time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org