You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/09/03 20:25:13 UTC

[GitHub] [couchdb] davisp opened a new pull request #3127: Prototype/fdb layer replace couch rate

davisp opened a new pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127


   ## Overview
   
   Replace the couch_rate rate limiting algorithm with couch_views_batch to dynamically optimize the indexer batch size during view updates.
   
   While working on optimizing couch_views I ran into couch_rate behaving oddly. Reading through the implementation it was nearly impossible to predict how changing various parameters would affect the behavior under load. The rate limiting algorithms were also a bit misapplied. We don't really want a rate "limiter" so much as a rate "maximizer". 
   
   After failing to comprehend couch_rate and realizing that it was missing some fairly important signals (specifically, the approximate transaction size) I decided to take a whack at simplifying things. The new approach is quite a bit simpler and results in the ability to both successfully index large views (1,000,000 doc ranges were tested) but also has a more predictable and simple behavior.
   
   ![batch_size_search](https://user-images.githubusercontent.com/19929/92163250-f6697480-edf8-11ea-9553-9da2be9f1152.png)
   
   The graph above is a representative of the behavior. Batch sizes start out small when a view indexer starts. The batch size is quickly ramped up to search for the threshold of a batch size, and then batch sizes are adjusted in much smaller increments to follow the optimal batch size.
   
   Matching the old behavior of couch_rate to attempt to maintain a consistent transaction time limit can be accomplished trivially by adjusting the `couch_views.batch_tx_max_time` parameter.
   
   ## Testing recommendations
   
   `make check`
   
   ## Checklist
   
   - [x] Code is written and works correctly
   - [x] Changes are covered by tests
   - [x] Any new configurable parameters are documented in `rel/overlay/etc/default.ini`
   - [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-692821860


   Updated the max size parameter to include bytes. Will merge when CI comes back green.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485051049



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(validate_opts(St));
+
+start(#batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(State::term()) -> NewState::term().
+failure(#batch_st{} = St) ->
+    St#batch_st{
+        size = erlang:max(1, St#batch_st.size div 2),
+        state = sense
+    }.
+
+
+validate_opts(St) ->
+    #batch_st{
+        size = Size,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = Penalty
+    } = St,
+    St#batch_st{
+        size = non_neg_integer(Size, batch_initial_size),
+        search_incr = non_neg_integer(SearchIncr, batch_search_increment),
+        sense_incr = non_neg_integer(SenseIncr, batch_sense_increment),
+        max_tx_size = non_neg_integer(MaxTxSize, batch_max_tx_size),
+        max_tx_time = non_neg_integer(MaxTxTime, batch_max_tx_time),
+        threshold_penalty = float_0_to_1(Penalty, batch_threshold_penalty)
+    }.
+
+
+get_config(Key, Default) ->
+    config:get("couch_views", Key, Default).
+
+
+non_neg_integer(Str, Name) ->
+    try
+        Val = list_to_integer(Str),
+        true = Val > 0,
+        Val
+    catch _:_ ->
+        erlang:error({invalid_non_neg_integer, {couch_views, Name, Str}})
+    end.
+
+
+float_0_to_1(Str, Name) ->
+    Val = try
+        list_to_float(Str)
+    catch error:badarg ->
+        erlang:error({invalid_float, {couch_views, Name, Str}})
+    end,
+    if Val >= 0.0 andalso Val =< 1.0 -> Val; true ->
+        erlang:error({float_out_of_range, {couch_views, Name, Str}})
+    end.
+
+
+load_state() ->
+    get(?MODULE).
+
+
+save_state(Mod, Batch) ->

Review comment:
       Done.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),

Review comment:
       Done.

##########
File path: src/couch_views/README.md
##########
@@ -16,3 +17,18 @@ Code layout:
 * `couch_views_server` - Spawns `couch_views_indexer` workers to handle index update jobs.
 * `couch_views_updater` - Update interactive indexes during doc update transactions
 * `couch_views_util` - Various utility functions
+
+# Configuration
+
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       Done.

##########
File path: rel/overlay/etc/default.ini
##########
@@ -309,6 +309,14 @@ iterations = 10 ; iterations for password hashing
 ;
 ; The maximum allowed value size emitted from a view for a document (in bytes)
 ;value_size_limit = 64000
+;
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485341802



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -78,10 +86,10 @@ init() ->
         fail_job(Job, Data, sig_changed, "Design document was modified")
     end,
 
-    Limiter = couch_rate:create_if_missing({DbName, DDocId}, "views"),
-
     State = #{
         tx_db => undefined,
+        tx_size => 0,

Review comment:
       Yeah, makes sense. I was thinking of low cardinality ones just see a range of transaction sizes and estimated limits, but you're right it would bounce around too much and not be very useful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485144522



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.

Review comment:
       With your latest rewrite and separation into two modules it is not required to use `opaque`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp merged pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp merged pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r486637664



##########
File path: src/couch_views/src/couch_views_batch_impl.erl
##########
@@ -0,0 +1,147 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_impl).
+
+-behavior(couch_views_batch).
+
+
+-export([
+    start/2,
+    success/4,
+    failure/2
+]).
+
+
+-include("couch_mrview/include/couch_mrview.hrl").
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time_msec,
+    threshold_penalty
+}).
+
+
+-spec start(
+        Mrst::#mrst{},
+        State::term()
+    ) -> {NewState::term(), BatchSize::pos_integer()}.
+start(Mrst, undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time_msec = get_config("batch_max_tx_time_msec", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")

Review comment:
       Done.

##########
File path: src/couch_views/src/couch_views_batch_impl.erl
##########
@@ -0,0 +1,147 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_impl).
+
+-behavior(couch_views_batch).
+
+
+-export([
+    start/2,
+    success/4,
+    failure/2
+]).
+
+
+-include("couch_mrview/include/couch_mrview.hrl").
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time_msec,
+    threshold_penalty
+}).
+
+
+-spec start(
+        Mrst::#mrst{},
+        State::term()
+    ) -> {NewState::term(), BatchSize::pos_integer()}.
+start(Mrst, undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time_msec = get_config("batch_max_tx_time_msec", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(Mrst, validate_opts(St));
+
+start(_Mrst, #batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        Mrst::#mrst{},
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(_Mrst, TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time_msec = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(Mrst::#mrst{}, State::term()) -> NewState::term().
+failure(_Mrst, #batch_st{} = St) ->
+    St#batch_st{
+        size = erlang:max(1, St#batch_st.size div 2),
+        state = sense
+    }.
+
+
+validate_opts(St) ->
+    #batch_st{
+        size = Size,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time_msec = MaxTxTime,
+        threshold_penalty = Penalty
+    } = St,
+    St#batch_st{
+        size = non_neg_integer(Size, batch_initial_size),
+        search_incr = non_neg_integer(SearchIncr, batch_search_increment),
+        sense_incr = non_neg_integer(SenseIncr, batch_sense_increment),
+        max_tx_size = non_neg_integer(MaxTxSize, batch_max_tx_size),
+        max_tx_time_msec = non_neg_integer(MaxTxTime, batch_max_tx_time_msec),
+        threshold_penalty = float_0_to_1(Penalty, batch_threshold_penalty)

Review comment:
       Done.

##########
File path: src/couch_views/README.md
##########
@@ -7,42 +7,28 @@ Currently only map indexes are supported and it will always return the full inde
 Code layout:
 
 * `couch_views` - Main entry point to query a view
-* `couch_views_reader` - Reads from the index for queries
+* `couch_views_batch` - Dynamically determine optimal batch sizes for view indexers.
+* `couch_views_encoding` - Encodes view keys that are byte comparable following CouchDB view sort order.
+* `couch_views_fdb` - Maps view operations to FoundationDB logic.
+* `couch_views_http` - View specific helpers for chttpd
 * `couch_views_indexer` - `couch_jobs` worker that builds an index from the changes feed.
+* `couch_views_reader` - Reads from the index for queries
 * `couch_vews_jobs` - `couch_views` interactions with `couch_jobs`. It handles adding index jobs and subscribes to jobs.
-* `couch_views_fdb` - Maps view operations to FoundationDB logic.
-* `couch_views_encoding` - Encodes view keys that are byte comparable following CouchDB view sort order.
 * `couch_views_server` - Spawns `couch_views_indexer` workers to handle index update jobs.
+* `couch_views_updater` - Update interactive indexes during doc update transactions
+* `couch_views_util` - Various utility functions
 
 # Configuration
 
-## Configuring rate limiter
-
-Here is the example of configuration used in `couch_view` application:
-
-```
-[couch_rate.views]
-limiter = couch_rate_limiter
-opts = #{budget => 100, target => 2500, window => 60000, sensitivity => 1000}
-```
-
-Supported fields in `opts`:
-
-* `budget` - the initial value for estimated batch size
-* `target` - the amount in msec which we try to maintain for batch processing time
-* `window` - time interval for contention detector
-* `sensitivity` - minimal interval within the `window`
-
-Unsupported fields in `opts` (if you really know what you are doing):
-
-* `window_size` - how many batches to consider in contention detector
-* `timer` - this is used for testing to fast forward time `fun() -> current_time_in_ms() end`
-* `target` - the amount in msec which we try to maintain for batch processing time
-* `underload_threshold` - a threshold below which we would try to increase the budget
-* `overload_threshold` - a threshold above which we would start decreasing the budget
-* `delay_threshold` - a threshold above which we would start introducing delays between batches
-* `multiplicative_factor` - determines how fast we are going to decrease budget (must be in (0..1) range)
-* `regular_delay` - delay between batches when there is no overload
-* `congested_delay` - delay between batches when there is an overload
-* `initial_budget` - initial value for budget to start with
-
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       Done.

##########
File path: src/couch_views/test/couch_views_batch_test.erl
##########
@@ -0,0 +1,83 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_test).
+
+
+-include_lib("eunit/include/eunit.hrl").
+-include_lib("fabric/test/fabric2_test.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+batch_test_() ->
+    {
+        "Test view batch sizing",
+        {
+            setup,
+            fun setup/0,
+            fun cleanup/1,
+            with([
+                ?TDEF(basic),
+                ?TDEF(search_success),
+                ?TDEF(sense_success),
+                ?TDEF(failure),
+                ?TDEF(failure_switches_to_sense)
+            ])
+        }
+    }.
+
+
+setup() ->
+    test_util:start_couch().
+
+
+cleanup(Ctx) ->
+    test_util:stop_couch(Ctx).
+
+
+basic(_) ->
+    erase(couch_views_batch),
+    ?assertEqual(100, couch_views_batch:start(#mrst{})).
+
+
+search_success(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:success(#mrst{}, 0, 0),
+    ?assertEqual(600, couch_views_batch:start(#mrst{})).
+
+
+sense_success(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    % Exceeding our threshold switches from search to sense
+    couch_views_batch:success(#mrst{}, 10000000, 5000),
+    ?assertEqual(80, couch_views_batch:start(#mrst{})),
+    couch_views_batch:success(#mrst{}, 0, 0),
+    ?assertEqual(180, couch_views_batch:start(#mrst{})).
+
+
+failure(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:failure(#mrst{}),
+    ?assertEqual(50, couch_views_batch:start(#mrst{})).
+
+
+failure_switches_to_sense(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:failure(#mrst{}),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:success(#mrst{}, 0, 0),
+    ?assertEqual(150, couch_views_batch:start(#mrst{})).
+

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-689052891


   @nickva I've addressed your three main suggestions. For the metrics I don't have a good answer other than maybe expose the timing logs I've currently somewhat hacked in the views branch. Let me know if you think some sort of combined metric there would be useful and I can add that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484888421



##########
File path: src/couch_views/README.md
##########
@@ -16,3 +17,18 @@ Code layout:
 * `couch_views_server` - Spawns `couch_views_indexer` workers to handle index update jobs.
 * `couch_views_updater` - Update interactive indexes during doc update transactions
 * `couch_views_util` - Various utility functions
+
+# Configuration
+
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       `batch_max_tx_time_msec` would be better 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r486640826



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       If you look at the old implementation `WrittenDocs` is exactly the same value as the new `DocsRead`, except that `DocsRead` is taken directly from `length(DocAcc)` where as `WrittenDocs` is calculated while iterating over `DocAcc`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-690742754






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487142754



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487064819



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-690742754






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484888032



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),

Review comment:
       `max_tx_time_msec = get_config("batch_max_tx_time_msec", "4500"),` would be better 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485051348



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487998504



##########
File path: src/couch_views/README.md
##########
@@ -7,42 +7,28 @@ Currently only map indexes are supported and it will always return the full inde
 Code layout:
 
 * `couch_views` - Main entry point to query a view
-* `couch_views_reader` - Reads from the index for queries
+* `couch_views_batch` - Dynamically determine optimal batch sizes for view indexers.

Review comment:
       Will do, good catch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484867309



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.

Review comment:
       Should we define and use `-opaque state() :: term().`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-689009537


   > For the meck-via-Elixir, that would have worked if this test were running in the same VM but those tests are run separately against an instance of dev/run.
   
   We have two kinds of Elixir tests
   - integration tests which live in test/elixir/test
   - unit tests which live in `src/<app>/test/exunit/`
   
   Unit tests are run on the same VM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484887449



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,

Review comment:
       `max_tx_time_msec` would be better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485349921



##########
File path: src/couch_views/README.md
##########
@@ -7,42 +7,28 @@ Currently only map indexes are supported and it will always return the full inde
 Code layout:
 
 * `couch_views` - Main entry point to query a view
-* `couch_views_reader` - Reads from the index for queries
+* `couch_views_batch` - Dynamically determine optimal batch sizes for view indexers.
+* `couch_views_encoding` - Encodes view keys that are byte comparable following CouchDB view sort order.
+* `couch_views_fdb` - Maps view operations to FoundationDB logic.
+* `couch_views_http` - View specific helpers for chttpd
 * `couch_views_indexer` - `couch_jobs` worker that builds an index from the changes feed.
+* `couch_views_reader` - Reads from the index for queries
 * `couch_vews_jobs` - `couch_views` interactions with `couch_jobs`. It handles adding index jobs and subscribes to jobs.
-* `couch_views_fdb` - Maps view operations to FoundationDB logic.
-* `couch_views_encoding` - Encodes view keys that are byte comparable following CouchDB view sort order.
 * `couch_views_server` - Spawns `couch_views_indexer` workers to handle index update jobs.
+* `couch_views_updater` - Update interactive indexes during doc update transactions
+* `couch_views_util` - Various utility functions
 
 # Configuration
 
-## Configuring rate limiter
-
-Here is the example of configuration used in `couch_view` application:
-
-```
-[couch_rate.views]
-limiter = couch_rate_limiter
-opts = #{budget => 100, target => 2500, window => 60000, sensitivity => 1000}
-```
-
-Supported fields in `opts`:
-
-* `budget` - the initial value for estimated batch size
-* `target` - the amount in msec which we try to maintain for batch processing time
-* `window` - time interval for contention detector
-* `sensitivity` - minimal interval within the `window`
-
-Unsupported fields in `opts` (if you really know what you are doing):
-
-* `window_size` - how many batches to consider in contention detector
-* `timer` - this is used for testing to fast forward time `fun() -> current_time_in_ms() end`
-* `target` - the amount in msec which we try to maintain for batch processing time
-* `underload_threshold` - a threshold below which we would try to increase the budget
-* `overload_threshold` - a threshold above which we would start decreasing the budget
-* `delay_threshold` - a threshold above which we would start introducing delays between batches
-* `multiplicative_factor` - determines how fast we are going to decrease budget (must be in (0..1) range)
-* `regular_delay` - delay between batches when there is no overload
-* `congested_delay` - delay between batches when there is an overload
-* `initial_budget` - initial value for budget to start with
-
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       Update the default.ini config with the updated `batch_max_tx_time_msec` name?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp edited a comment on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp edited a comment on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-686742866


   I should mention that my hack to generate the batch_size graphs had a bug where it would double count if a transaction would be retried. That spike to ~8500 just happens to be at the transition in this graph. There's always a conflict towards the start of view indexing due to a conflict on the `couch_jobs` data that I have yet to track down.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-688869483


   In regards to `Move error reporting test to EUnit` commit you've mentioned: 
   
   > This test doesn't fail correctly any longer. Rather than attempting to
   create a new pathological view case I've just moved it to eunit where we can use meck to throw errors directly.
   
   The meck can be used from ExUnit. 
   ```
     defp with_fdb_error(_ctx) do
       :ok = :meck.new(:couch_views_batch, [:passthrough])
       :meck.expect(:couch_views_batch, :start, [], fn() ->
         :erlang.error({:erlfdb_error, 2101})
       end)
       on_exit(fn ->
         :meck.unload()
       end)
     end
   
   describe "something something" do
       setup [....,  :with_fdb_error]
       test "something something", ctx do
          ...
       end
   end
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-689060998


   @davisp Looks, let's skip the metrics for now, could be a future enhancement
   
   I am getting a consistent timeout in the `couch_views_indexer_test:39: indexer_test_ (indexed_empty_db)...` test while it passes on prototype/fdb-layer. Looking a bit into that


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484871069



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(validate_opts(St));
+
+start(#batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(State::term()) -> NewState::term().
+failure(#batch_st{} = St) ->
+    St#batch_st{
+        size = erlang:max(1, St#batch_st.size div 2),
+        state = sense
+    }.
+
+
+validate_opts(St) ->
+    #batch_st{
+        size = Size,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = Penalty
+    } = St,
+    St#batch_st{
+        size = non_neg_integer(Size, batch_initial_size),
+        search_incr = non_neg_integer(SearchIncr, batch_search_increment),
+        sense_incr = non_neg_integer(SenseIncr, batch_sense_increment),
+        max_tx_size = non_neg_integer(MaxTxSize, batch_max_tx_size),
+        max_tx_time = non_neg_integer(MaxTxTime, batch_max_tx_time),
+        threshold_penalty = float_0_to_1(Penalty, batch_threshold_penalty)
+    }.
+
+
+get_config(Key, Default) ->
+    config:get("couch_views", Key, Default).
+
+
+non_neg_integer(Str, Name) ->
+    try
+        Val = list_to_integer(Str),
+        true = Val > 0,
+        Val
+    catch _:_ ->
+        erlang:error({invalid_non_neg_integer, {couch_views, Name, Str}})
+    end.
+
+
+float_0_to_1(Str, Name) ->
+    Val = try
+        list_to_float(Str)
+    catch error:badarg ->
+        erlang:error({invalid_float, {couch_views, Name, Str}})
+    end,
+    if Val >= 0.0 andalso Val =< 1.0 -> Val; true ->
+        erlang:error({float_out_of_range, {couch_views, Name, Str}})
+    end.
+
+
+load_state() ->
+    get(?MODULE).
+
+
+save_state(Mod, Batch) ->

Review comment:
       Should we rename `Batch` into `State`? Because we call it as `save_state(Mod, NewState),`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-686742866


   I should mention that my hack to generate the batch_size graphs had a bug where it would double count if a transaction would be retried. that spike to ~8500 just happens to be at the transition in this graph. There's always a conflict towards the start of view indexing due to a conflict on the `couch_jobs` data that I have yet to track down.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485506554



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       The `couch_rate:success/2` callback used `WrittenDocs` argument. Would it be possible to pass `DocsWrite` to the callback? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485349921



##########
File path: src/couch_views/README.md
##########
@@ -7,42 +7,28 @@ Currently only map indexes are supported and it will always return the full inde
 Code layout:
 
 * `couch_views` - Main entry point to query a view
-* `couch_views_reader` - Reads from the index for queries
+* `couch_views_batch` - Dynamically determine optimal batch sizes for view indexers.
+* `couch_views_encoding` - Encodes view keys that are byte comparable following CouchDB view sort order.
+* `couch_views_fdb` - Maps view operations to FoundationDB logic.
+* `couch_views_http` - View specific helpers for chttpd
 * `couch_views_indexer` - `couch_jobs` worker that builds an index from the changes feed.
+* `couch_views_reader` - Reads from the index for queries
 * `couch_vews_jobs` - `couch_views` interactions with `couch_jobs`. It handles adding index jobs and subscribes to jobs.
-* `couch_views_fdb` - Maps view operations to FoundationDB logic.
-* `couch_views_encoding` - Encodes view keys that are byte comparable following CouchDB view sort order.
 * `couch_views_server` - Spawns `couch_views_indexer` workers to handle index update jobs.
+* `couch_views_updater` - Update interactive indexes during doc update transactions
+* `couch_views_util` - Various utility functions
 
 # Configuration
 
-## Configuring rate limiter
-
-Here is the example of configuration used in `couch_view` application:
-
-```
-[couch_rate.views]
-limiter = couch_rate_limiter
-opts = #{budget => 100, target => 2500, window => 60000, sensitivity => 1000}
-```
-
-Supported fields in `opts`:
-
-* `budget` - the initial value for estimated batch size
-* `target` - the amount in msec which we try to maintain for batch processing time
-* `window` - time interval for contention detector
-* `sensitivity` - minimal interval within the `window`
-
-Unsupported fields in `opts` (if you really know what you are doing):
-
-* `window_size` - how many batches to consider in contention detector
-* `timer` - this is used for testing to fast forward time `fun() -> current_time_in_ms() end`
-* `target` - the amount in msec which we try to maintain for batch processing time
-* `underload_threshold` - a threshold below which we would try to increase the budget
-* `overload_threshold` - a threshold above which we would start decreasing the budget
-* `delay_threshold` - a threshold above which we would start introducing delays between batches
-* `multiplicative_factor` - determines how fast we are going to decrease budget (must be in (0..1) range)
-* `regular_delay` - delay between batches when there is no overload
-* `congested_delay` - delay between batches when there is an overload
-* `initial_budget` - initial value for budget to start with
-
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       Update readme with the new `batch_max_tx_time_msec` name too




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485073115



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -165,48 +171,60 @@ add_error(Error, Reason, Data) ->
 
 
 update(#{} = Db, Mrst0, State0) ->
-    Limiter = maps:get(limiter, State0),
-    case couch_rate:budget(Limiter) of
-        0 ->
-            couch_rate:wait(Limiter),
-            update(Db, Mrst0, State0);
-        Limit ->
-            {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit, limiter => Limiter}),
-            case State1 of
-                finished ->
-                    couch_eval:release_map_context(Mrst1#mrst.qserver);
-                _ ->
-                    couch_rate:wait(Limiter),
-                    update(Db, Mrst1, State1)
+    Limit = couch_views_batch:start(),
+    try
+        {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit}),
+        case State1 of
+            finished ->
+                couch_eval:release_map_context(Mrst1#mrst.qserver);
+            _ ->
+                #{
+                    docs_read := DocsRead,
+                    tx_size := TxSize
+                } = State1,
+                couch_views_batch:success(TxSize, DocsRead),
+                update(Db, Mrst1, State1)

Review comment:
       Ooooh, good catch. Will fix that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485341066



##########
File path: src/couch_views/src/couch_views_batch_impl.erl
##########
@@ -0,0 +1,147 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_impl).
+
+-behavior(couch_views_batch).
+
+
+-export([
+    start/2,
+    success/4,
+    failure/2
+]).
+
+
+-include("couch_mrview/include/couch_mrview.hrl").
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time_msec,
+    threshold_penalty
+}).
+
+
+-spec start(
+        Mrst::#mrst{},
+        State::term()
+    ) -> {NewState::term(), BatchSize::pos_integer()}.
+start(Mrst, undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time_msec = get_config("batch_max_tx_time_msec", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(Mrst, validate_opts(St));
+
+start(_Mrst, #batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        Mrst::#mrst{},
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(_Mrst, TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time_msec = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(Mrst::#mrst{}, State::term()) -> NewState::term().
+failure(_Mrst, #batch_st{} = St) ->
+    St#batch_st{
+        size = erlang:max(1, St#batch_st.size div 2),
+        state = sense
+    }.
+
+
+validate_opts(St) ->
+    #batch_st{
+        size = Size,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time_msec = MaxTxTime,
+        threshold_penalty = Penalty
+    } = St,
+    St#batch_st{
+        size = non_neg_integer(Size, batch_initial_size),
+        search_incr = non_neg_integer(SearchIncr, batch_search_increment),
+        sense_incr = non_neg_integer(SenseIncr, batch_sense_increment),
+        max_tx_size = non_neg_integer(MaxTxSize, batch_max_tx_size),
+        max_tx_time_msec = non_neg_integer(MaxTxTime, batch_max_tx_time_msec),
+        threshold_penalty = float_0_to_1(Penalty, batch_threshold_penalty)

Review comment:
       Would this be shorter to combine with `get_config/2`. Maybe something like:
   
   ```
   St = #batch_st{
       size = get_config(batch_initial_size, "100", fun non_neg_integer/2),
       search_incr = get_config(batch_search_increment, "500", fun non_neg_integer/2),
       ...
   }
   ```
   
   Then have 
   
   ```
   get_config(Key, Default, ValFun) when is_function(ValFun, 2) ->
       Str = config:get("couch_views", atom_to_list(Key), Default),
       ValFun(Str, Key).
   ```
   
   Up to you if you feel like it would eliminate the extra unpacking from validation...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485052752



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -165,48 +171,60 @@ add_error(Error, Reason, Data) ->
 
 
 update(#{} = Db, Mrst0, State0) ->
-    Limiter = maps:get(limiter, State0),
-    case couch_rate:budget(Limiter) of
-        0 ->
-            couch_rate:wait(Limiter),
-            update(Db, Mrst0, State0);
-        Limit ->
-            {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit, limiter => Limiter}),
-            case State1 of
-                finished ->
-                    couch_eval:release_map_context(Mrst1#mrst.qserver);
-                _ ->
-                    couch_rate:wait(Limiter),
-                    update(Db, Mrst1, State1)
+    Limit = couch_views_batch:start(),
+    try
+        {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit}),
+        case State1 of
+            finished ->
+                couch_eval:release_map_context(Mrst1#mrst.qserver);
+            _ ->
+                #{
+                    docs_read := DocsRead,
+                    tx_size := TxSize
+                } = State1,
+                couch_views_batch:success(TxSize, DocsRead),
+                update(Db, Mrst1, State1)
+        end
+    catch
+        error:{erlfdb_error, Error} ->
+            case lists:member(Error, ?RETRY_FAILURES) of
+                true ->
+                    couch_views_batch:failure(),
+                    update(Db, Mrst0, State0);
+                false ->
+                    erlang:error({erlfdb_error, Error})

Review comment:
       Will this loose the original stacktrace? It might not be as nice but we could defined ?RETRY_FAILURES as a guard maybe. Would this work:
   
   ```erlang
   -define(RETRY_FAILURE(E), (E =:= 1007 orelse E =:= 1031 ...)).
   ```
   And then say `catch error:{erlfdb_error, E} when ?RETRY_FAILURE(E) ...` ?

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -165,48 +171,60 @@ add_error(Error, Reason, Data) ->
 
 
 update(#{} = Db, Mrst0, State0) ->
-    Limiter = maps:get(limiter, State0),
-    case couch_rate:budget(Limiter) of
-        0 ->
-            couch_rate:wait(Limiter),
-            update(Db, Mrst0, State0);
-        Limit ->
-            {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit, limiter => Limiter}),
-            case State1 of
-                finished ->
-                    couch_eval:release_map_context(Mrst1#mrst.qserver);
-                _ ->
-                    couch_rate:wait(Limiter),
-                    update(Db, Mrst1, State1)
+    Limit = couch_views_batch:start(),
+    try
+        {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit}),
+        case State1 of
+            finished ->
+                couch_eval:release_map_context(Mrst1#mrst.qserver);
+            _ ->
+                #{
+                    docs_read := DocsRead,
+                    tx_size := TxSize
+                } = State1,
+                couch_views_batch:success(TxSize, DocsRead),
+                update(Db, Mrst1, State1)

Review comment:
       Because of the catch that wraps it, would this still be tail recursive? Say if we call this 1M times would it have to keep the stack frames around to unwind after the call?

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -78,10 +86,10 @@ init() ->
         fail_job(Job, Data, sig_changed, "Design document was modified")
     end,
 
-    Limiter = couch_rate:create_if_missing({DbName, DDocId}, "views"),
-
     State = #{
         tx_db => undefined,
+        tx_size => 0,

Review comment:
       Minor nit: would these make sense emitted as metrics. It could a separate PR though

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,68 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, "couch_views_batch_impl").
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),

Review comment:
       Wonder if it makes sense to pass in dbname and/or ddoc id to the `Mod:start/1` so the batch maximizer could apply different rules to different databases or design docs perhaps. For example, like we pass in the DocsRead below even though we don't actually use it in the basic implementation.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484888032



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),

Review comment:
       `max_tx_time_msec` would be better 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487691585



##########
File path: src/couch_views/README.md
##########
@@ -7,42 +7,28 @@ Currently only map indexes are supported and it will always return the full inde
 Code layout:
 
 * `couch_views` - Main entry point to query a view
-* `couch_views_reader` - Reads from the index for queries
+* `couch_views_batch` - Dynamically determine optimal batch sizes for view indexers.

Review comment:
       Can you add in `couch_views_batch_impl` here. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485073562



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -165,48 +171,60 @@ add_error(Error, Reason, Data) ->
 
 
 update(#{} = Db, Mrst0, State0) ->
-    Limiter = maps:get(limiter, State0),
-    case couch_rate:budget(Limiter) of
-        0 ->
-            couch_rate:wait(Limiter),
-            update(Db, Mrst0, State0);
-        Limit ->
-            {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit, limiter => Limiter}),
-            case State1 of
-                finished ->
-                    couch_eval:release_map_context(Mrst1#mrst.qserver);
-                _ ->
-                    couch_rate:wait(Limiter),
-                    update(Db, Mrst1, State1)
+    Limit = couch_views_batch:start(),
+    try
+        {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit}),
+        case State1 of
+            finished ->
+                couch_eval:release_map_context(Mrst1#mrst.qserver);
+            _ ->
+                #{
+                    docs_read := DocsRead,
+                    tx_size := TxSize
+                } = State1,
+                couch_views_batch:success(TxSize, DocsRead),
+                update(Db, Mrst1, State1)
+        end
+    catch
+        error:{erlfdb_error, Error} ->
+            case lists:member(Error, ?RETRY_FAILURES) of
+                true ->
+                    couch_views_batch:failure(),
+                    update(Db, Mrst0, State0);
+                false ->
+                    erlang:error({erlfdb_error, Error})

Review comment:
       That seems reasonable. I'll update to use that pattern.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487064819



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485052363



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(validate_opts(St));
+
+start(#batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().

Review comment:
       These specs match the callbacks of the behavior for couch_views_batch so I'm not sure whether that's appropriate or not?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484894589



##########
File path: rel/overlay/etc/default.ini
##########
@@ -309,6 +309,14 @@ iterations = 10 ; iterations for password hashing
 ;
 ; The maximum allowed value size emitted from a view for a document (in bytes)
 ;value_size_limit = 64000
+;
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       `batch_max_tx_time_msec` would be better




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485324919



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,68 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, "couch_views_batch_impl").
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),

Review comment:
       Good idea passing in the mrst record, even better than just the dbname and docid 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484887838



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),

Review comment:
       `max_tx_time_msec` would be better 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485343899



##########
File path: src/couch_views/test/couch_views_batch_test.erl
##########
@@ -0,0 +1,83 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_test).
+
+
+-include_lib("eunit/include/eunit.hrl").
+-include_lib("fabric/test/fabric2_test.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+batch_test_() ->
+    {
+        "Test view batch sizing",
+        {
+            setup,
+            fun setup/0,
+            fun cleanup/1,
+            with([
+                ?TDEF(basic),
+                ?TDEF(search_success),
+                ?TDEF(sense_success),
+                ?TDEF(failure),
+                ?TDEF(failure_switches_to_sense)
+            ])
+        }
+    }.
+
+
+setup() ->
+    test_util:start_couch().
+
+
+cleanup(Ctx) ->
+    test_util:stop_couch(Ctx).
+
+
+basic(_) ->
+    erase(couch_views_batch),
+    ?assertEqual(100, couch_views_batch:start(#mrst{})).
+
+
+search_success(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:success(#mrst{}, 0, 0),
+    ?assertEqual(600, couch_views_batch:start(#mrst{})).
+
+
+sense_success(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    % Exceeding our threshold switches from search to sense
+    couch_views_batch:success(#mrst{}, 10000000, 5000),
+    ?assertEqual(80, couch_views_batch:start(#mrst{})),
+    couch_views_batch:success(#mrst{}, 0, 0),
+    ?assertEqual(180, couch_views_batch:start(#mrst{})).
+
+
+failure(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:failure(#mrst{}),
+    ?assertEqual(50, couch_views_batch:start(#mrst{})).
+
+
+failure_switches_to_sense(_) ->
+    erase(couch_views_batch),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:failure(#mrst{}),
+    couch_views_batch:start(#mrst{}),
+    couch_views_batch:success(#mrst{}, 0, 0),
+    ?assertEqual(150, couch_views_batch:start(#mrst{})).
+

Review comment:
       Super tiny nit: extra newline




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484871069



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(validate_opts(St));
+
+start(#batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(State::term()) -> NewState::term().
+failure(#batch_st{} = St) ->
+    St#batch_st{
+        size = erlang:max(1, St#batch_st.size div 2),
+        state = sense
+    }.
+
+
+validate_opts(St) ->
+    #batch_st{
+        size = Size,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = Penalty
+    } = St,
+    St#batch_st{
+        size = non_neg_integer(Size, batch_initial_size),
+        search_incr = non_neg_integer(SearchIncr, batch_search_increment),
+        sense_incr = non_neg_integer(SenseIncr, batch_sense_increment),
+        max_tx_size = non_neg_integer(MaxTxSize, batch_max_tx_size),
+        max_tx_time = non_neg_integer(MaxTxTime, batch_max_tx_time),
+        threshold_penalty = float_0_to_1(Penalty, batch_threshold_penalty)
+    }.
+
+
+get_config(Key, Default) ->
+    config:get("couch_views", Key, Default).
+
+
+non_neg_integer(Str, Name) ->
+    try
+        Val = list_to_integer(Str),
+        true = Val > 0,
+        Val
+    catch _:_ ->
+        erlang:error({invalid_non_neg_integer, {couch_views, Name, Str}})
+    end.
+
+
+float_0_to_1(Str, Name) ->
+    Val = try
+        list_to_float(Str)
+    catch error:badarg ->
+        erlang:error({invalid_float, {couch_views, Name, Str}})
+    end,
+    if Val >= 0.0 andalso Val =< 1.0 -> Val; true ->
+        erlang:error({float_out_of_range, {couch_views, Name, Str}})
+    end.
+
+
+load_state() ->
+    get(?MODULE).
+
+
+save_state(Mod, Batch) ->

Review comment:
       Should we rename `Batch` into `State`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484888421



##########
File path: src/couch_views/README.md
##########
@@ -16,3 +17,18 @@ Code layout:
 * `couch_views_server` - Spawns `couch_views_indexer` workers to handle index update jobs.
 * `couch_views_updater` - Update interactive indexes during doc update transactions
 * `couch_views_util` - Various utility functions
+
+# Configuration
+
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes
+; batch_max_tx_time = 4500 ; Maximum transaction time in milliseconds

Review comment:
       `max_tx_time_msec` would be better 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485341066



##########
File path: src/couch_views/src/couch_views_batch_impl.erl
##########
@@ -0,0 +1,147 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_impl).
+
+-behavior(couch_views_batch).
+
+
+-export([
+    start/2,
+    success/4,
+    failure/2
+]).
+
+
+-include("couch_mrview/include/couch_mrview.hrl").
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time_msec,
+    threshold_penalty
+}).
+
+
+-spec start(
+        Mrst::#mrst{},
+        State::term()
+    ) -> {NewState::term(), BatchSize::pos_integer()}.
+start(Mrst, undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time_msec = get_config("batch_max_tx_time_msec", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(Mrst, validate_opts(St));
+
+start(_Mrst, #batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        Mrst::#mrst{},
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(_Mrst, TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time_msec = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(Mrst::#mrst{}, State::term()) -> NewState::term().
+failure(_Mrst, #batch_st{} = St) ->
+    St#batch_st{
+        size = erlang:max(1, St#batch_st.size div 2),
+        state = sense
+    }.
+
+
+validate_opts(St) ->
+    #batch_st{
+        size = Size,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time_msec = MaxTxTime,
+        threshold_penalty = Penalty
+    } = St,
+    St#batch_st{
+        size = non_neg_integer(Size, batch_initial_size),
+        search_incr = non_neg_integer(SearchIncr, batch_search_increment),
+        sense_incr = non_neg_integer(SenseIncr, batch_sense_increment),
+        max_tx_size = non_neg_integer(MaxTxSize, batch_max_tx_size),
+        max_tx_time_msec = non_neg_integer(MaxTxTime, batch_max_tx_time_msec),
+        threshold_penalty = float_0_to_1(Penalty, batch_threshold_penalty)

Review comment:
       Would this be shorter to combine with `get_config/2`. Maybe something like:
   
   ```
   St = #batch_st{
       size = get_config(batch_initial_size, "100", fun non_neg_integer/2),
       search_incr = get_config(batch_search_increment, "500", fun non_neg_integer/2),
       ...
   }
   ```
   
   Then have 
   
   ```
   get_config(Key, Default, ValFun) when is_function(ValFun, 2) ->
       Str = config:get("couch_views", atom_to_list(Key), Default),
       ValFun(Str, Key).
   ```
   
   Up to you if you feel like it would eliminate the extra struct unpacking and re-building in order to validate it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-690742754


   @davisp lgtm +1 to merge from me, also good job adding extra tests!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-690739310


   @nickva I've addressed all of your suggestions. Let me know if you have anything else.
   
   @iilyak @garrensmith I'm pretty sure I've addressed all of your concerns, let me know if there's anything left to handle.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-688849471


   > I'm not convinced about having the default implementation of batch_module in the same file as couch_views_batch. I initially found it confusing trying to figure out where it starts and where couch_views_batch is. I would prefer we have a couch_batch module similar to couch_rate with the default module and couch_views_batch.
   
   I agree with Garren. This separation would also hide the `#batch_st{}` record in `couch_views_batch` module.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-690742754


   @davisp lgtm +1 to merge, also good job adding extra tests!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487064819



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       My understanding is that map function can emit 0 or more documents, therefore `DocsRead /= DocsWritten`:
   
   - DocsRead  - `length(DocAcc)`
   - DocsWritten - `length(MappedDocs)`
   
   In `couch_rate` the `WrittenDocs` was a result of iterating over `MappedDocs` https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_views/src/couch_views_indexer.erl#L205
   
   The reason I calculated it during iteration is to avoid extra `O(n)` cost on `length(List)`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485051815



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.

Review comment:
       I'm not sure what that buys us or how it'd interact with other modules implementing the behavior?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485144522



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.

Review comment:
       With you latest rewrite and separation into two modules it is not required to use `opaque`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r488626424



##########
File path: rel/overlay/etc/default.ini
##########
@@ -312,6 +312,14 @@ iterations = 10 ; iterations for password hashing
 ;
 ; The maximum allowed value size emitted from a view for a document (in bytes)
 ;value_size_limit = 64000
+;
+; Batch size sensing parameters
+; batch_initial_size = 100 ; Initial batch size in number of documents
+; batch_search_increment = 500 ; Size change when searching for the threshold
+; batch_sense_increment = 100 ; Size change increment after hitting a threshold
+; batch_max_tx_size = 9000000 ; Maximum transaction size in bytes

Review comment:
       Should we name it `batch_max_tx_size_bytes` to make it clear that units are different compared to `batch_initial_size`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-689002928


   @garrensmith I've split the behavior/implementation between `couch_views_batch` (behavior) and `couch_views_batch_impl` (implementation). I don't believe this behavior is nearly abstract enough to cover anything besides couch_views so it doesn't make a lot of sense to me to create a new application to hold a new module that only couch_views will use.
   
   @iilyak For the scope I agree that this is a lot more narrow. I started by trying to figure out the AIMD/congestion algorithms in both couch_rate and couch_replicator but after thinking awhile I realized that having an AIMD based algorithm isn't appropriate for batch sizes since we want to avoid failing batches as much as possible because every failed batch is wasted time when building a view. I did a number of different approaches to try and measure optimal batch sizes and behaviors. An earlier version of this work actually tried to optimize the rate of indexing within the bounds of transaction sizes. That lead to the realization that just maximizing the work done in a single transaction was most beneficial overall for indexing throughput. That's how it ended up being so simple and narrow in the end.
   
   For the meck-via-Elixir, that would have worked if this test were running in the same VM but those tests are run separately against an instance of `dev/run`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r488134125



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       I've tweaked the API to include the total number of emitted KVs and switched to using a map with a defined type exported from `couch_views_batch`. Let me know if you can think of any other useful values or if those three are good enough for now.

##########
File path: src/couch_views/README.md
##########
@@ -7,42 +7,28 @@ Currently only map indexes are supported and it will always return the full inde
 Code layout:
 
 * `couch_views` - Main entry point to query a view
-* `couch_views_reader` - Reads from the index for queries
+* `couch_views_batch` - Dynamically determine optimal batch sizes for view indexers.

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487142754



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r487142754



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,80 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+
+
+-callback start(
+            Mrst::#mrst{},
+            State::term()
+        ) -> {NewState::term(), BatchSize::pos_integer()}.
+
+-callback success(
+            Mrst::#mrst{},
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),

Review comment:
       A map function can emit 0 or more rows for each of the defined views. For every doc we have a result set regardless of whether it has rows or not (because we have to remember we may be removing a document from a view). Regardless of whether a doc emits rows or not, it still has an entry in `MappedDocs` and `length(DocAcc) == length(MappedDocs)`.
   
   I could see counting up the number of rows emitted for each document as maybe being an interesting value to have access to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-689206296


   @davisp Found the issue, couch_views builds are were fast enough to trigger a race condition in couch_jobs monitoring code. The job was completing before the type monitor for couch_views jobs even started.
   
   Made a PR to fix it:
     https://github.com/apache/couchdb/pull/3135


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485072389



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -78,10 +86,10 @@ init() ->
         fail_job(Job, Data, sig_changed, "Design document was modified")
     end,
 
-    Limiter = couch_rate:create_if_missing({DbName, DDocId}, "views"),
-
     State = #{
         tx_db => undefined,
+        tx_size => 0,

Review comment:
       I was noodling over a way to dynamically enable the tracing logs I've been using while developing this. As metrics I think the cardinality would either be high enough it'd crush most metric collectors or coarse enough that it wouldn't be super helpful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485334382



##########
File path: src/couch_views/src/couch_views_batch_impl.erl
##########
@@ -0,0 +1,147 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch_impl).
+
+-behavior(couch_views_batch).
+
+
+-export([
+    start/2,
+    success/4,
+    failure/2
+]).
+
+
+-include("couch_mrview/include/couch_mrview.hrl").
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time_msec,
+    threshold_penalty
+}).
+
+
+-spec start(
+        Mrst::#mrst{},
+        State::term()
+    ) -> {NewState::term(), BatchSize::pos_integer()}.
+start(Mrst, undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time_msec = get_config("batch_max_tx_time_msec", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")

Review comment:
       Minor nit: maybe set `state = search` explicitly here (as well or just here), just so all the state initialization is in one place and it's obvious that it always starts in `search`, stays there until it hits the threshold, then remains in `sense` after that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-689074435


   @nickva I sometimes have one but it seems to be only occasionally for the first run after compiling and then doesn't happen again. Let me know if you find anything though. Flaky tests are never good.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r485071468



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,68 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, "couch_views_batch_impl").
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),

Review comment:
       Good idea! I'll pass in the Mrst record to each of the three functions since its a public record that contains the database name and design document id.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
iilyak commented on pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#issuecomment-688859591


   Just wanted to mention that the scope of `couch_rate` was bigger. The idea behind `couch_rate` was to provide a `rate limiter`/`batch size estimator` which can be used anywhere where we have a chance of a transaction failure from FDB and where we can be better off by adjusting transaction size (one possible use case is to unify with https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_replicator/src/couch_replicator_rate_limiter.erl). However since it seems like we are getting [a consensus that we should introduce limits](https://lists.apache.org/thread.html/rac6c90c4ae03dc055c7e8be6eca1c1e173cf2f98d2afe6d018e62d29%40%3Cdev.couchdb.apache.org%3E) we might not need a generic solution and `couch_view` specific one might suffice. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3127: Prototype/fdb layer replace couch rate

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3127:
URL: https://github.com/apache/couchdb/pull/3127#discussion_r484219166



##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(validate_opts(St));
+
+start(#batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().

Review comment:
       Could you make this spec more specific and state that we returning the `#batch_st` record. And same for the input. Its expecting a `#batch_st`

##########
File path: src/couch_views/src/couch_views_batch.erl
##########
@@ -0,0 +1,192 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_views_batch).
+
+
+-export([
+    start/0,
+    success/2,
+    failure/0
+]).
+
+-export([
+    start/1,
+    success/3,
+    failure/1
+]).
+
+
+-callback start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+-callback success(
+            TxSize::non_neg_integer(),
+            DocsRead::non_neg_integer(),
+            State::term()
+        ) -> NewState::term().
+-callback failure(State::term()) -> NewState::term().
+
+
+-define(DEFAULT_MOD, atom_to_list(?MODULE)).
+
+
+-record(batch_st, {
+    start_time,
+    size,
+    state = search,
+    search_incr,
+    sense_incr,
+    max_tx_size,
+    max_tx_time,
+    threshold_penalty
+}).
+
+
+start() ->
+    {Mod, State} = case load_state() of
+        {M, S} ->
+            {M, S};
+        undefined ->
+            ModStr = config:get("couch_views", "batch_module", ?DEFAULT_MOD),
+            ModAtom = list_to_existing_atom(ModStr),
+            {ModAtom, undefined}
+    end,
+    {NewState, BatchSize} = Mod:start(State),
+    save_state(Mod, NewState),
+    BatchSize.
+
+
+success(TxSize, DocsRead) ->
+    {Mod, State} = load_state(),
+    NewState = Mod:success(TxSize, DocsRead, State),
+    save_state(Mod, NewState),
+    ok.
+
+
+failure() ->
+    {Mod, State} = load_state(),
+    NewState = Mod:failure(State),
+    save_state(Mod, NewState),
+    ok.
+
+
+-spec start(State::term()) -> {NewState::term(), BatchSize::pos_integer()}.
+start(undefined) ->
+    St = #batch_st{
+        size = get_config("batch_initial_size", "100"),
+        search_incr = get_config("batch_search_increment", "500"),
+        sense_incr = get_config("batch_sense_increment", "100"),
+        max_tx_size = get_config("batch_max_tx_size", "9000000"),
+        max_tx_time = get_config("batch_max_tx_time", "4500"),
+        threshold_penalty = get_config("batch_threshold_penalty", "0.2")
+    },
+    start(validate_opts(St));
+
+start(#batch_st{size = Size} = St) ->
+    NewSt = St#batch_st{
+        start_time = erlang:monotonic_time()
+    },
+    {NewSt, Size}.
+
+
+-spec success(
+        TxSize::non_neg_integer(),
+        DocsRead::non_neg_integer(),
+        State::term()
+    ) -> NewState::term().
+success(TxSize, _DocsRead, #batch_st{} = St) ->
+    #batch_st{
+        start_time = StartTime,
+        size = Size,
+        state = State,
+        search_incr = SearchIncr,
+        sense_incr = SenseIncr,
+        max_tx_size = MaxTxSize,
+        max_tx_time = MaxTxTime,
+        threshold_penalty = ThresholdPenalty
+    } = St,
+
+    TxTimeNative = erlang:monotonic_time() - StartTime,
+    TxTime = erlang:convert_time_unit(TxTimeNative, native, millisecond),
+
+    {NewSize, NewState} = case TxSize > MaxTxSize orelse TxTime > MaxTxTime of
+        true ->
+            {round(Size * (1.0 - ThresholdPenalty)), sense};
+        false when State == search ->
+            {Size + SearchIncr, State};
+        false when State == sense ->
+            {Size + SenseIncr, State}
+    end,
+
+    St#batch_st{
+        size = erlang:max(1, NewSize),
+        state = NewState
+    }.
+
+
+-spec failure(State::term()) -> NewState::term().

Review comment:
       Can you also make these specs more specific

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -165,48 +171,60 @@ add_error(Error, Reason, Data) ->
 
 
 update(#{} = Db, Mrst0, State0) ->
-    Limiter = maps:get(limiter, State0),
-    case couch_rate:budget(Limiter) of
-        0 ->
-            couch_rate:wait(Limiter),
-            update(Db, Mrst0, State0);
-        Limit ->
-            {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit, limiter => Limiter}),
-            case State1 of
-                finished ->
-                    couch_eval:release_map_context(Mrst1#mrst.qserver);
-                _ ->
-                    couch_rate:wait(Limiter),
-                    update(Db, Mrst1, State1)
+    Limit = couch_views_batch:start(),
+    try
+        {Mrst1, State1} = do_update(Db, Mrst0, State0#{limit => Limit}),
+        case State1 of
+            finished ->
+                couch_eval:release_map_context(Mrst1#mrst.qserver);
+            _ ->
+                #{
+                    docs_read := DocsRead,
+                    tx_size := TxSize
+                } = State1,
+                couch_views_batch:success(TxSize, DocsRead),
+                update(Db, Mrst1, State1)
+        end
+    catch
+        error:{erlfdb_error, Error} ->
+            case lists:member(Error, ?RETRY_FAILURES) of

Review comment:
       This is really nice. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org