You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/07/14 05:25:34 UTC

[GitHub] [couchdb] tonysun83 opened a new pull request #3003: add active_tasks for view builds using version stamps

tonysun83 opened a new pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003


   
   <!-- Thank you for your contribution!
   
        Please file this form by replacing the Markdown comments
        with your text. If a section needs no action - remove it.
   
        Also remember, that CouchDB uses the Review-Then-Commit (RTC) model
        of code collaboration. Positive feedback is represented +1 from committers
        and negative is a -1. The -1 also means veto, and needs to be addressed
        to proceed. Once there are no objections, the PR can be merged by a
        CouchDB committer.
   
        See: http://couchdb.apache.org/bylaws.html#decisions for more info. -->
   
   ## Overview
   
   Active Tasks requires TotalChanges and ChangesDone to show the progress
   of long running tasks. This requires `count_changes_since/2` to be
   implemented. Unfortunately, that is not easily done  with
   foundationdb. This commit replaces TotalChanges with the
   versionstamp + the number of docs as a progress indicator. This can
   possibly break existing api that relys on TotalChanges. ChangesDone
   will still exist, but instead of relying on the current changes seq
   it is simply a reflection of how many documents were written by the
   updater process.
   
   The new responses look like this:
   
   ```
   [
       {
           "node": "node1@127.0.0.1",
           "pid": "<0.622.0>",
           "changes_done": 199,
           "current_version_stamp": "8131141649532-198",
           "database": "testdb",
           "db_version_stamp": "8131141649532-999",
           "design_document": "_design/example",
           "started_on": 1594703583,
           "type": "indexer",
           "updated_on": 1594703586
       }
   ]
   
   [
       {
           "node": "node1@127.0.0.1",
           "pid": "<0.1030.0>",
           "changes_done": 1000,
           "current_version_stamp": "8131194932916-999",
           "database": "testdb",
           "db_version_stamp": "8131194932916-999",
           "design_document": "_design/example",
           "started_on": 1594703636,
           "type": "indexer",
           "updated_on": 1594703665
       }
   ]
   ```
   
   ## Testing recommendations
   
   Tests need to be added. If this approach is accepted, I will add more precise tests. Currently ran manual tests.
   
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affects multiple components in different
        repositories please put links to those issues or pull requests here.  -->
   
   ## Checklist
   
   - [X] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in `rel/overlay/etc/default.ini`
   - [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459666294



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,35 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0,
+    get_active_task_info/1,
+
+    update_active_task_info/2

Review comment:
       Check the file with emilio, it might complain that the order of exports doesn't match the definition order. So might have to put `update_active_task_info/2` first, or change the definition order, whichever looks better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r460142348



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,51 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0,
+    get_active_task_info/1,
+
+    update_active_task_info/2
+]).
+
+
+-define(ACTIVE_TASK_INFO, <<"active_task_info">>).
+
+
+get_active_tasks() ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined), fun(JTx) ->
+        Types = couch_jobs:get_types(JTx),
+        lists:foldl(fun(Type, TaskAcc) ->
+            JobIds = couch_jobs:get_active_jobs_ids(JTx, Type),
+            Tasks = lists:filtermap(fun(JobId) ->
+                {ok, Data} = couch_jobs:get_job_data(JTx, Type, JobId),
+                case maps:get(?ACTIVE_TASK_INFO, Data, not_found) of
+                    not_found -> false;
+                    Info -> {true, Info}
+                end
+            end, JobIds),
+            TaskAcc ++ Tasks
+        end, [], Types)
+    end).
+
+
+get_active_task_info(JobData) ->
+    #{?ACTIVE_TASK_INFO:= ActiveTaskInfo} = JobData,
+    ActiveTaskInfo.
+
+
+update_active_task_info(JobData, ActiveTaskInfo) ->
+     JobData#{?ACTIVE_TASK_INFO => ActiveTaskInfo}.

Review comment:
       oh, I thought I changed this. weird. thx




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459533390



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,28 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0
+]).
+
+
+-include_lib("couch/include/couch_db.hrl").
+-include("fabric2.hrl").
+
+
+-define(ACTIVE_TASK_JOB_TYPES, [<<"views">>, <<"replication">>]).

Review comment:
       To avoid hard-coding the types we could just let any job expose the `"active_task_info"` object and we'd automatically display it. So in the get_active_tasks we could query all the types first then call `get_active_jobs(JTx, Type)`
   
   To get all the types we could expose https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321 in `couch_jobs` as 
   
   ```
   get_types(Tx) ->
        couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
           couch_jobs_fdb:get_types(JTx)
       end).
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r454465133



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -231,10 +231,24 @@ maybe_set_build_status(TxDb, Mrst1, _ViewVS, State) ->
 get_update_start_state(TxDb, Mrst, #{db_seq := undefined} = State) ->
     ViewVS = couch_views_fdb:get_creation_vs(TxDb, Mrst),
     ViewSeq = couch_views_fdb:get_update_seq(TxDb, Mrst),
+    DbSeq = fabric2_db:get_update_seq(TxDb),
+    DbVS = fabric2_fdb:seq_to_vs(DbSeq),
+    ViewVS1 = convert_version_stamp(ViewVS),
+    DbVS1 = convert_version_stamp(DbVS),
+
+    couch_task_status:add_task([
+        {type, indexer},
+        {database, Mrst#mrst.db_name},
+        {design_document, Mrst#mrst.idx_name},
+        {changes_done, 0},

Review comment:
       I didn't have a good solution to this. Mainly that the indexer process has to die first which means the active_task tuple is automatically removed from memory. I was thinking of writing a counter to fdb, but that opened up a new can of worms. The idea here is that is that each process will just own a portion of what they've indexed and not care about what's been done previously. In other words, if you have an index that built 900 documents out of a 1000 document db and it had to restart, it'll start from 0 and just get up to 100 for ChangesDone and finish. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-663181606


   @tonysun83 looks great! awesome work. Just a few minor nits, whitespace and indents and such


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662557731


   @davisp : Here was @garrensmith 's concern https://github.com/apache/couchdb/pull/3003#discussion_r457145064. But I think I found a compromise and pushed (now the populate_tasks function is way simpler without the reads). We really just need to add two new items to the JobData info: <<"changes_done">> and <<"db_seq">>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459537603



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -483,17 +489,28 @@ report_progress(State, UpdateType) ->
         tx_db := TxDb,
         job := Job1,
         job_data := JobData,
-        last_seq := LastSeq
+        last_seq := LastSeq,
+        db_seq := DBSeq,
+        changes_done := ChangesDone
     } = State,
 
     #{
         <<"db_name">> := DbName,
         <<"db_uuid">> := DbUUID,
         <<"ddoc_id">> := DDocId,
         <<"sig">> := Sig,
-        <<"retries">> := Retries
+        <<"retries">> := Retries,
+        <<"active_tasks_info">> := ActiveTasks

Review comment:
       To avoid hard-coding the field name in all the application, and having a chance someone mistyping it in the future, let's have accessor functions in `fabric2_active_tasks` so that's the only place which knows about `<<"active_task_info">>`. Could have something like `fabric2_active_tasks:update_active_task_info(JobData, ActiveTaskInfo) -> JobData1` and `get_active_task_info(JobData) -> ActiveTaskInfo` for example




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r457148893



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -483,17 +489,28 @@ report_progress(State, UpdateType) ->
         tx_db := TxDb,
         job := Job1,
         job_data := JobData,
-        last_seq := LastSeq
+        last_seq := LastSeq,
+        db_seq := DBSeq,
+        changes_done := ChangesDone0
     } = State,
 
     #{
         <<"db_name">> := DbName,
         <<"db_uuid">> := DbUUID,
         <<"ddoc_id">> := DDocId,
         <<"sig">> := Sig,
-        <<"retries">> := Retries
+        <<"retries">> := Retries,
+        <<"active_tasks_info">> := ActiveTasks
     } = JobData,
 
+    ChangesDone1 = case maps:get(<<"changes_done">>, ActiveTasks, 0) of

Review comment:
       This can be `ChangesDone = maps:get(<<"changes_done">>, ActiveTasks, 0)` The `0` should be the value if it doesn't exist. 

##########
File path: src/chttpd/src/chttpd_misc.erl
##########
@@ -294,11 +296,25 @@ dbs_info_callback({error, Reason}, #vacc{resp = Resp0} = Acc) ->
 
 handle_task_status_req(#httpd{method='GET'}=Req) ->
     ok = chttpd:verify_is_server_admin(Req),
-    {Replies, _BadNodes} = gen_server:multi_call(couch_task_status, all),
-    Response = lists:flatmap(fun({Node, Tasks}) ->
-        [{[{node,Node} | Task]} || Task <- Tasks]
-    end, Replies),
-    send_json(Req, lists:sort(Response));
+    Jobs = lists:foldl(fun(Type, JobsAcc) ->

Review comment:
       It would be better to move this into `fabric2_fdb`. `fabric2_fdb:get_active_tasks` or something like that

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,35 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    VS = case LastSeq of

Review comment:
       I don't like storing all this active task info in the job. All of this can be generated when CouchDB receive's a `_active_tasks` request. If we keep adding more and more information to the job info means we running the risk of exceeding the FDB value limit. Rather just add the `current_version_stamp` and `db_version_stamp` to the job options. Is `indexer_pid` actually useful? What would a user do with that? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r460137345



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,51 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0,
+    get_active_task_info/1,
+
+    update_active_task_info/2
+]).
+
+
+-define(ACTIVE_TASK_INFO, <<"active_task_info">>).
+
+
+get_active_tasks() ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined), fun(JTx) ->
+        Types = couch_jobs:get_types(JTx),
+        lists:foldl(fun(Type, TaskAcc) ->
+            JobIds = couch_jobs:get_active_jobs_ids(JTx, Type),
+            Tasks = lists:filtermap(fun(JobId) ->
+                {ok, Data} = couch_jobs:get_job_data(JTx, Type, JobId),
+                case maps:get(?ACTIVE_TASK_INFO, Data, not_found) of
+                    not_found -> false;
+                    Info -> {true, Info}
+                end
+            end, JobIds),
+            TaskAcc ++ Tasks
+        end, [], Types)
+    end).
+
+
+get_active_task_info(JobData) ->
+    #{?ACTIVE_TASK_INFO:= ActiveTaskInfo} = JobData,
+    ActiveTaskInfo.
+
+
+update_active_task_info(JobData, ActiveTaskInfo) ->
+     JobData#{?ACTIVE_TASK_INFO => ActiveTaskInfo}.

Review comment:
       Tiny nit: 5 indent instead of 4, add a newline at the end of the file as well, just for so GH diff doesn't show the red warning :-P




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459661222



##########
File path: src/couch_jobs/src/couch_jobs.erl
##########
@@ -104,6 +106,23 @@ get_job_state(Tx, Type, JobId) when is_binary(JobId) ->
     end).
 
 
+-spec get_active_jobs_ids(jtx(), job_type()) -> [job_id()] | {error,
+    any()}.
+get_active_jobs_ids(Tx, Type) ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
+        Since = couch_jobs_fdb:get_active_since(JTx, Type,
+            {versionstamp, 0, 0}),
+        maps:keys(Since)
+    end).
+
+
+-spec get_types(jtx()) -> [job_type()] | {error, any()}.
+get_types(Tx) ->
+     couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->

Review comment:
       Minor nit: I think it's a 5 space indent instead of 4




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662529977


   @tonysun83 looks better! Good improvement. I made a few comments-in line.
   
   One part that I am not sure if it is worth starting two transaction per task item just to compute version stamp and maybe a few other fields. That seems quite expensive if say someone is polling status on a cluster with hundreds of tasks, given that jobs already have the dbs open and all that data available to start with. The reason we decided to do that was to ensure we are below 100KB for JobData, I think?
   
   We'd have to make a convincing case that our active_jobs status section in each JobData would have a chance of pushing it over the 100KB value limit. I tried encoding a large-ish object similar to the one we are adding for the task status:
   
   ```
   S = #{
       <<"database">> => << <<"x">> || _ <- lists:seq(1,256) >>,
       <<"changes_done">> => 999999999999999999999999999999,
       <<"design_document">> => << <<"d">> || _ <- lists:seq(1,512) >>,
       <<"current_version_stamp">> => <<"1111111111111111111111111111111111111111">>,
       <<"db_version_stamp">> => <<"222222222222222222222222222222222222222222222">>
   }.
   ```
   
   ```
   >> size(iolist_to_binary(jiffy:encode(S))).
   984
   ```
   
   So we'd be adding at most another 1KB (in the worst case) to get closer to the 100KB limit if we save the status in the job data that doesn't seem that bad perhaps and it would simplify the design quite a bit more.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662552156


   I'm actually counting three possible transactions between opening the database if its not in cache, one for the reading the design doc, then one for the other two bits of info.
   
   Can someone point me to the discussion where we decided to move away from storing active task info in the job state? I'm not seeing how this is a better approach or why the job state approach is unworkable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-658229486


   @tonysun83 
   
   Agree with @garrensmith, for reporting the job on all the nodes, it might be easier to think of something like this:
   
     * Add an `<<"active_task_status">>` section to each job's data section, already converted to json format.
   
     * In the http handler we would query all the active jobs and just emit those json objects. All active jobs are conveniently  located under a single `?ACTIVITY` subspace and can be queried with [get_active_since/3]( https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L358) (it was done that way on purpose to make it simple to query all active jobs in order to report _active_tasks easier 😉 ).
   
    * To get all the possible job types can use [get_types](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321). 
   
    * Since we'd be calling those outside of couch_jobs we could add a new function to `couch_jobs`, something like `get_active_jobs` perhaps.
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459531537



##########
File path: src/couch_jobs/src/couch_jobs.erl
##########
@@ -104,6 +105,12 @@ get_job_state(Tx, Type, JobId) when is_binary(JobId) ->
     end).
 
 
+get_active_jobs(Tx, Type) ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
+        couch_jobs_fdb:get_active_since(JTx, Type, {versionstamp, 0, 0})

Review comment:
       If this returns extra info besides JobIds, let's strip it out so it's just a list of JobIds. So it would basically be `[JobId1, JobId2, ...]`. It might make the caller's logic simpler so they don't have to unpack stuff they don't care about.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459665548



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,35 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0,
+    get_active_task_info/1,
+
+    update_active_task_info/2
+]).
+
+
+-define(ACTIVE_TASK_INFO, <<"active_task_info">>).
+
+
+get_active_tasks() ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined), fun(JTx) ->
+        Types = couch_jobs:get_types(JTx),
+        lists:foldl(fun(Type, TaskAcc) ->
+            JobIds = couch_jobs:get_active_jobs_ids(JTx, Type),
+            Tasks = lists:map(fun(JobId) ->

Review comment:
       For jobs that don't have an ?ACTIVE_TASK_INFO field, we might want to not return an empty object but just skip it from the result list altogether. Could use a `filtermap/2` maybe and then if TaskInfo is `#{}` return `false`, otherwise return `{true, TaskInfo}` http://erlang.org/doc/man/lists.html#filtermap-2

##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,35 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0,
+    get_active_task_info/1,
+
+    update_active_task_info/2
+]).
+
+
+-define(ACTIVE_TASK_INFO, <<"active_task_info">>).
+
+
+get_active_tasks() ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined), fun(JTx) ->
+        Types = couch_jobs:get_types(JTx),
+        lists:foldl(fun(Type, TaskAcc) ->
+            JobIds = couch_jobs:get_active_jobs_ids(JTx, Type),
+            Tasks = lists:map(fun(JobId) ->

Review comment:
       For jobs that don't have an `?ACTIVE_TASK_INFO` field, we might want to not return an empty object but just skip it from the result list altogether. Could use a `filtermap/2` maybe and then if TaskInfo is `#{}` return `false`, otherwise return `{true, TaskInfo}` http://erlang.org/doc/man/lists.html#filtermap-2




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459529561



##########
File path: src/couch_jobs/src/couch_jobs.erl
##########
@@ -19,6 +19,7 @@
     remove/3,
     get_job_data/3,
     get_job_state/3,
+    get_active_jobs/2,

Review comment:
       Since we are not returning job data, let's call it `get_active_job_ids/2`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458484992



##########
File path: src/fabric/src/fabric2_fdb.erl
##########
@@ -610,6 +612,21 @@ incr_stat(#{} = Db, Section, Key, Increment) when is_integer(Increment) ->
     erlfdb:add(Tx, BinKey, Increment).
 
 
+get_active_tasks() ->
+    ActiveTasks = lists:foldl(fun(Type, TaskAcc) ->
+        Tasks = couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined),
+            fun(JTx) ->
+                JobIds = couch_jobs:get_active_jobs(JTx, Type),
+                maps:map(fun(JobId, _) ->
+                    {ok, Data} = couch_jobs:get_job_data(JTx, Type, JobId),
+                    maps:get(<<"active_tasks_info">>, Data, #{})
+                end, JobIds)
+            end),
+        maps:merge(TaskAcc, Tasks)
+    end, #{}, ?ACTIVE_TASK_JOB_TYPES),
+    maps:values(ActiveTasks).
+
+

Review comment:
       @nickva created separate modules for active tasks in fabric and couch_views. I got rid of the need to have a `<<"active_tasks_info">>`. Let me know what you think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458883410



##########
File path: src/couch_jobs/src/couch_jobs_fdb.erl
##########
@@ -33,6 +33,7 @@
 
     get_activity_vs/2,
     get_activity_vs_and_watch/2,
+    get_active_jobs/1,

Review comment:
       ya was rebasing to get the couch_jobs addition as a separate commit and forgot to undo this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r456974816



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +556,25 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+update_tasks(NumChanges, false) ->

Review comment:
       removed, no longer needed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r454465133



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -231,10 +231,24 @@ maybe_set_build_status(TxDb, Mrst1, _ViewVS, State) ->
 get_update_start_state(TxDb, Mrst, #{db_seq := undefined} = State) ->
     ViewVS = couch_views_fdb:get_creation_vs(TxDb, Mrst),
     ViewSeq = couch_views_fdb:get_update_seq(TxDb, Mrst),
+    DbSeq = fabric2_db:get_update_seq(TxDb),
+    DbVS = fabric2_fdb:seq_to_vs(DbSeq),
+    ViewVS1 = convert_version_stamp(ViewVS),
+    DbVS1 = convert_version_stamp(DbVS),
+
+    couch_task_status:add_task([
+        {type, indexer},
+        {database, Mrst#mrst.db_name},
+        {design_document, Mrst#mrst.idx_name},
+        {changes_done, 0},

Review comment:
       I didn't have a good solution to this. Mainly that the indexer process has to die first which means the active_task tuple is automatically removed from memory. I was thinking of writing a counter to fdb, but that opened up a new can of worms
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r460134570



##########
File path: src/couch_views/src/couch_views_util.erl
##########
@@ -276,3 +277,27 @@ is_paginated(#mrargs{page_size = PageSize}) when is_integer(PageSize) ->
 
 is_paginated(_) ->
     false.
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    #{
+        <<"type">> => <<"indexer">>,

Review comment:
       Ok that makes sense. We can maybe revisit that once we have active tasks for replication and see if its work slimming down.

##########
File path: src/couch_views/src/couch_views_util.erl
##########
@@ -276,3 +277,27 @@ is_paginated(#mrargs{page_size = PageSize}) when is_integer(PageSize) ->
 
 is_paginated(_) ->
     false.
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    #{
+        <<"type">> => <<"indexer">>,

Review comment:
       Ok that makes sense. We can maybe revisit that once we have active tasks for replication and see if its worth slimming down.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662582624


   @tonysun83:  Agree with @davisp, it's not worth having the registration system just for this.
   
   We could still hide the `<<"active_task_info">>` field name with a simple library function like 
   
   ```
   fabric2_active_tasks:update_active_task_info(JobData, ActiveTaskInfo) ->
        JobData#{?ACTIVE_TASK_INFO => ActiveTaskInfo}.
   ```
   
   And use the original design https://github.com/apache/couchdb/pull/3003#issuecomment-658229486
   
   >  I think @nickva thought that the registration callback would help with the sizing issue
   
   The registration would have helped with encapsulation, if we had a size issue and had to compute the section on the fly to avoid the http handle hard-coding knowledge about the replication and indexing apps.  But I think we determined computing on the fly was not as optimal, and it wasn't adding that much more to the job data size. If we ever do need to compute it on the fly, we could also come back add the complexity later.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-663171318


   @nickva addressed your latest comments and added a test. Will add more tests later and then rebase with a proper commit history.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662557731


   @davisp : Here was @garrensmith 's concern https://github.com/apache/couchdb/pull/3003#discussion_r457145064. But I think I found a compromise. We really just need to add two new items to the JobData info: <<"changes_done">> and <<"db_seq">>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r454468373



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -231,10 +231,24 @@ maybe_set_build_status(TxDb, Mrst1, _ViewVS, State) ->
 get_update_start_state(TxDb, Mrst, #{db_seq := undefined} = State) ->
     ViewVS = couch_views_fdb:get_creation_vs(TxDb, Mrst),
     ViewSeq = couch_views_fdb:get_update_seq(TxDb, Mrst),
+    DbSeq = fabric2_db:get_update_seq(TxDb),
+    DbVS = fabric2_fdb:seq_to_vs(DbSeq),
+    ViewVS1 = convert_version_stamp(ViewVS),
+    DbVS1 = convert_version_stamp(DbVS),
+
+    couch_task_status:add_task([
+        {type, indexer},
+        {database, Mrst#mrst.db_name},
+        {design_document, Mrst#mrst.idx_name},
+        {changes_done, 0},

Review comment:
       Historically, `changes_done` refers to how many changes have been processed by this view update. It's for judging how much work is left before the view is up to date. One way to think of this would be if you had a 10,000,000 docs processed for a view, and then went to process another 1,000,000 docs to bring the view up to date, you wouldn't want to start at 99% finished as that'd be misleading to operators.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 merged pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 merged pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458867968



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,45 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0,
+    register_tasks/1
+]).
+
+
+-include_lib("couch/include/couch_db.hrl").
+-include("fabric2.hrl").
+
+
+-define(ACTIVE_TASK_JOB_TYPES, [<<"views">>, <<"replication">>]).
+
+
+register_tasks(Mod) when is_atom(Mod) ->
+    ActiveTasks = lists:usort([Mod | registrations()]),
+    application:set_env(fabric, active_tasks, ActiveTasks).
+
+
+get_active_tasks() ->
+    ActiveTasks = lists:foldl(fun(Type, TaskAcc) ->
+        Tasks = couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined),
+            fun(JTx) ->
+                JobIds = couch_jobs:get_active_jobs(JTx, Type),
+                maps:map(fun(JobId, _) ->
+                    {ok, Data} = couch_jobs:get_job_data(JTx, Type, JobId),
+                    populate_active_tasks(Type, Data)

Review comment:
       For consistency wonder if we could pass the JobId in as well, so it might be `populate_active_tasks(Type, JobId, JobData)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r457491889



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,35 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    VS = case LastSeq of

Review comment:
       The reason I had thought of it is that for replication tasks, for example, we we'd have to know to introspect a few other places to know how to add "source", "target", etc. So active tasks would have to start knowing about how to extract fields from various applications. Perhaps we'd want a simple registration like we have for indices in fabric2_index: https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_index.erl#L51-L53 - modules (applications) register a callback for _active_tasks handler and call it when building the response? Or, perhaps we can do that as round 2 if we see the values getting close to 100KB size.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r454200239



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +556,25 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+update_tasks(NumChanges, false) ->

Review comment:
       Can you rename this to `update_active_tasks`

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -231,10 +231,24 @@ maybe_set_build_status(TxDb, Mrst1, _ViewVS, State) ->
 get_update_start_state(TxDb, Mrst, #{db_seq := undefined} = State) ->
     ViewVS = couch_views_fdb:get_creation_vs(TxDb, Mrst),
     ViewSeq = couch_views_fdb:get_update_seq(TxDb, Mrst),
+    DbSeq = fabric2_db:get_update_seq(TxDb),
+    DbVS = fabric2_fdb:seq_to_vs(DbSeq),

Review comment:
       Can you move all the related active_tasks logic into its own function. Maybe something like `active_tasks_start`.
   

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +556,25 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+update_tasks(NumChanges, false) ->
+    [Changes] = couch_task_status:get([changes_done]),
+    Changes2 = Changes + NumChanges,
+    couch_task_status:update([{changes_done, Changes2}]);
+
+update_tasks(NumChanges, LastSeq) ->
+    [Changes] = couch_task_status:get([changes_done]),
+    Changes2 = Changes + NumChanges,
+    VS = fabric2_fdb:seq_to_vs(LastSeq),
+    VS1 = convert_version_stamp(VS),
+    couch_task_status:update([{changes_done, Changes2},
+        {current_version_stamp, VS1}]).
+
+
+convert_version_stamp({_, Stamp, _, DocNumber}) ->
+    VS = integer_to_list(Stamp) ++ "-" ++ integer_to_list(DocNumber),
+    list_to_binary(VS);
+
+convert_version_stamp(_) ->

Review comment:
       What would cause us to reach this state?

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -231,10 +231,24 @@ maybe_set_build_status(TxDb, Mrst1, _ViewVS, State) ->
 get_update_start_state(TxDb, Mrst, #{db_seq := undefined} = State) ->
     ViewVS = couch_views_fdb:get_creation_vs(TxDb, Mrst),
     ViewSeq = couch_views_fdb:get_update_seq(TxDb, Mrst),
+    DbSeq = fabric2_db:get_update_seq(TxDb),
+    DbVS = fabric2_fdb:seq_to_vs(DbSeq),
+    ViewVS1 = convert_version_stamp(ViewVS),
+    DbVS1 = convert_version_stamp(DbVS),
+
+    couch_task_status:add_task([
+        {type, indexer},
+        {database, Mrst#mrst.db_name},
+        {design_document, Mrst#mrst.idx_name},
+        {changes_done, 0},

Review comment:
       Will changes always be 0? What happens if an index is half-built and the indexer is starting up again?

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -368,6 +382,7 @@ write_docs(TxDb, Mrst, Docs, State) ->
     if LastSeq == false -> ok; true ->
         couch_views_fdb:set_update_seq(TxDb, Sig, LastSeq)
     end,
+    update_tasks(length(Docs), LastSeq),

Review comment:
       I think you can use `DocsNumber` since that is the count of the number of docs processed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-658229486


   @tonysun83 
   
   Agree with @garrensmith, for reporting the job on all the nodes, it might be easier to think of something like this:
   
     * Add an `<<"active_task_status">>` section to each job's data section, already converted to json format.
   
     * In the http handler we would query all the active jobs and just emit those json objects. All active jobs are conveniently  located under a single `?ACTIVITY` subspace and can be queried with [get_active_since/3] https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L358) (it was done that way on purpose to make it simple to query all active jobs in order to report _active_tasks easier 😉 ).
   
    * To get all the possible job types can use [get_types](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321). 
   
    * Since we'd be calling those outside of couch_jobs we could add a new function to `couch_jobs`, something like `get_active_jobs` perhaps.
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662571992


   @davisp:  agreed that it's gotten way more complex. I think @nickva thought that the registration callback would help with the sizing issue, and as a side benefit, add flexibility for future `active_task` types (I know replication is next). I'm leaning toward the simpler method like we did previously without the callback but would like  @nickva  to chime in here since he would know what we would need for replication.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459871565



##########
File path: src/couch_views/src/couch_views_util.erl
##########
@@ -276,3 +277,27 @@ is_paginated(#mrargs{page_size = PageSize}) when is_integer(PageSize) ->
 
 is_paginated(_) ->
     false.
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    #{
+        <<"type">> => <<"indexer">>,

Review comment:
       I think the idea here is that if have different types of tasks, we would need to know what kind of fields in the JobData to extract for each new task. DBSeq and ChangesDone is probably universal, but different tasks might have unique fields and would then require a code update to fabric_active_tasks each time.
   
   Ex: Replication has  
      "checkpointed_source_seq": 68585,
      "continuous": false
   
   and probably has other fields we want as part of active tasks, but we'd have to extract it in fabric2_active_tasks also. 
   
   If we keep one standard active_tasks_info across all JobData, then it enforces encapsulation. As Nick showed up, it's only ~1KB.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459656918



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,35 @@
+-module(fabric2_active_tasks).

Review comment:
       need to add apache license here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458229872



##########
File path: src/fabric/src/fabric2_fdb.erl
##########
@@ -610,6 +612,21 @@ incr_stat(#{} = Db, Section, Key, Increment) when is_integer(Increment) ->
     erlfdb:add(Tx, BinKey, Increment).
 
 
+get_active_tasks() ->
+    ActiveTasks = lists:foldl(fun(Type, TaskAcc) ->
+        Tasks = couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined),
+            fun(JTx) ->
+                JobIds = couch_jobs:get_active_jobs(JTx, Type),
+                maps:map(fun(JobId, _) ->
+                    {ok, Data} = couch_jobs:get_job_data(JTx, Type, JobId),
+                    maps:get(<<"active_tasks_info">>, Data, #{})
+                end, JobIds)
+            end),
+        maps:merge(TaskAcc, Tasks)
+    end, #{}, ?ACTIVE_TASK_JOB_TYPES),
+    maps:values(ActiveTasks).
+
+

Review comment:
       Wonder if we could make a separate module for active_tasks, since most things in `fabric2_fdb` deal with db handles and we have a few indexing or db expiration modules that have the `fabric2_<...>` pattern already. 
   
   It could be something like `fabric2_active_tasks` and it would encapsulate both the `?ACTIVE_TASK_JOB_TYPES` list, as well as be the only place that knows about <<"active_tasks_info">>.
   
   So it might have a get_active_tasks() function and also another one that update a job's data with an active task status, like say `update_active_task_info(#{} = JobData, #{} = ActiveTaskInfo)` and it would just do `JobData#{<<"active_tasks_info">> => ActiveTaskInfo}`. Another alternative would be to have an `?ACTIVE_TASK_INFO` define in fabric2.hrl. Just a small ergonomic update to make we don't mistype <<"active_tasks_info">> name somewhere...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459534093



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,28 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0
+]).
+
+
+-include_lib("couch/include/couch_db.hrl").
+-include("fabric2.hrl").

Review comment:
       Are these used?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r454464009



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +556,25 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+update_tasks(NumChanges, false) ->
+    [Changes] = couch_task_status:get([changes_done]),
+    Changes2 = Changes + NumChanges,
+    couch_task_status:update([{changes_done, Changes2}]);
+
+update_tasks(NumChanges, LastSeq) ->
+    [Changes] = couch_task_status:get([changes_done]),
+    Changes2 = Changes + NumChanges,
+    VS = fabric2_fdb:seq_to_vs(LastSeq),
+    VS1 = convert_version_stamp(VS),
+    couch_task_status:update([{changes_done, Changes2},
+        {current_version_stamp, VS1}]).
+
+
+convert_version_stamp({_, Stamp, _, DocNumber}) ->
+    VS = integer_to_list(Stamp) ++ "-" ++ integer_to_list(DocNumber),
+    list_to_binary(VS);
+
+convert_version_stamp(_) ->

Review comment:
       initially, when we create the versionstamp it's `not_found`:
   `ViewVS = couch_views_fdb:get_creation_vs(TxDb, Mrst),`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458237167



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,35 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    VS = case LastSeq of

Review comment:
       @garrensmith @nickva how about leaving just `changes_done` to the job_data and then computing the rest on the fly? It seems we should be adding an entirely new fabric_active_tasks module which allows the extractions to reside int here

##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,35 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    VS = case LastSeq of

Review comment:
       @garrensmith @nickva how about leaving just `changes_done` to the job_data and then computing the rest on the fly? It seems we should be adding an entirely new fabric_active_tasks module which allows the extractions to reside in there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458230142



##########
File path: src/couch_jobs/src/couch_jobs_fdb.erl
##########
@@ -45,6 +45,7 @@
     get_jtx/0,
     get_jtx/1,
     tx/2,
+    init_jtx/1,

Review comment:
       Don't think we need this one anymore




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-658229486


   @tonysun83 
   
   Agree with @garrensmith, for reporting the job on all the nodes, it might be easier to think of something like this:
   
     * Having an `<<"active_task_status">>` section to each job's data section, already converted to json format.
   
     * In the http handler we would query all the active jobs and just emit those json objects. All active jobs are conveniently already located under a single `?ACTIVITY` subspace and can be queried with [get_active_since/3] https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L358) (it was done that way on purpose to make it simple to query all active jobs in order to report _active_tasks easier 😉 ).
   
    * To get all the possible job types can use [get_types](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321). 
   
    * Since we'd be calling those outside of couch_jobs we could add a new function to `couch_jobs`, something like `get_active_jobs` perhaps.
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r456971242



##########
File path: src/chttpd/src/chttpd_misc.erl
##########
@@ -31,12 +31,14 @@
 
 -include_lib("couch/include/couch_db.hrl").
 -include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("couch_views/include/couch_views.hrl").

Review comment:
       probably don't need this anymore




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-658229486


   @tonysun83 
   
   Agree with @garrensmith, for reporting the job on all the nodes, it might be easier to think of something like this:
   
     * Having an `<<"active_task_status">>` section to each job's data section, already converted to json format.
   
     * In the http handler we would query all the active jobs and just emit those json objects. All active jobs are conveniently  located under a single `?ACTIVITY` subspace and can be queried with [get_active_since/3] https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L358) (it was done that way on purpose to make it simple to query all active jobs in order to report _active_tasks easier 😉 ).
   
    * To get all the possible job types can use [get_types](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321). 
   
    * Since we'd be calling those outside of couch_jobs we could add a new function to `couch_jobs`, something like `get_active_jobs` perhaps.
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458877554



##########
File path: src/couch_views/src/couch_views_active_tasks.erl
##########
@@ -0,0 +1,87 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_active_tasks).
+
+
+-export([
+    populate_active_tasks/2
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+populate_active_tasks(<<"views">>, Data)  ->
+    #{
+        <<"db_name">> := DbName,
+        <<"ddoc_id">> := DDocId,
+        <<"changes_done">> := ChangesDone,
+        <<"db_uuid">> := DbUUID
+    } = Data,
+    
+    {ok, Db} = try
+        fabric2_db:open(DbName, [?ADMIN_CTX, {uuid, DbUUID}])
+    catch error:database_does_not_exist ->
+        {ok, not_found}
+    end,
+
+    {ok, DDoc} = case fabric2_db:open_doc(Db, DDocId) of
+        {ok, DDoc0} ->
+            {ok, DDoc0};
+        {not_found, _} ->
+            {ok, not_found}
+    end,

Review comment:
       A bit worried about this. We could be starting two transactions here for each task item. One for opening the db and other for opening the doc. With a few hundred active tasks and someone say polling `_active_tasks` for monitoring that could add a lot of load to the system just to add `VStamp` and `DbStamp` to the data.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
davisp commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662563321


   @tonysun83 The history of this PR is super confusing to me. The last update appears to have reverted to storing all of the info back in the job state which means that this new populate_active_tasks is just "format_job_as_active_task". Which is fine I suppose. Though it seems like it'd be a lot more straightforward for the jobs to just store what they want their active_task output to look like in the first place?
   
   Adding a whole registration/callback system for this seems much more complex than the original plan to just spit out the job data as JSON.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458230691



##########
File path: src/couch_jobs/src/couch_jobs.erl
##########
@@ -104,6 +105,12 @@ get_job_state(Tx, Type, JobId) when is_binary(JobId) ->
     end).
 
 
+get_active_jobs(Tx, Type) ->
+    couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
+        couch_jobs_fdb:get_active_since(JTx, Type, {versionstamp, 0, 0})
+    end).
+
+

Review comment:
       Let's make this a separate commit for this if possible




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458855658



##########
File path: src/couch_jobs/src/couch_jobs_fdb.erl
##########
@@ -355,6 +356,13 @@ get_activity_vs_and_watch(#{jtx := true} = JTx, Type) ->
     end.
 
 
+get_active_jobs(Type) ->
+    fabric2_fdb:transactional(fun(Tx) ->
+        JTx = init_jtx(Tx),                                                        
+        get_active_since(JTx, Type, {versionstamp, 0, 0})
+    end).
+
+

Review comment:
       We are calling get_active_since directly from couch_jobs module so we don't need this function anymore.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459533390



##########
File path: src/fabric/src/fabric2_active_tasks.erl
##########
@@ -0,0 +1,28 @@
+-module(fabric2_active_tasks).
+
+
+-export([
+    get_active_tasks/0
+]).
+
+
+-include_lib("couch/include/couch_db.hrl").
+-include("fabric2.hrl").
+
+
+-define(ACTIVE_TASK_JOB_TYPES, [<<"views">>, <<"replication">>]).

Review comment:
       To avoid hard-coding the types we could just let any job expose the active_task_info object and we'd automatically display it. So in the get_active_tasks we could query all the types first then call get_active_jobs(JTx, Type).
   
   To get all the types we could expose https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321 in `couch_jobs` as 
   
   ```
   get_types(Tx) ->
        couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
           couch_jobs_fdb:get_types(JTx)
       end).
   ```
   
   This fabric app doesn't have to know anything about replication or views and simply reports status for all the jobs which have an `"active_task_info"` object.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459871565



##########
File path: src/couch_views/src/couch_views_util.erl
##########
@@ -276,3 +277,27 @@ is_paginated(#mrargs{page_size = PageSize}) when is_integer(PageSize) ->
 
 is_paginated(_) ->
     false.
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    #{
+        <<"type">> => <<"indexer">>,

Review comment:
       I think the idea here is that if have different types of tasks, we would need to know what kind of fields in the JobData to extract for each new task. DBSeq and ChangesDone is probably universal, but different tasks might have unique fields and would then require an update to fabric_active_tasks each time.
   
   Ex: Replication has  
      "checkpointed_source_seq": 68585,
      "continuous": false
   
   and probably has other fields we want as part of active tasks, but we'd have to extract it in fabric2_active_tasks also. 
   
   If we keep one standard active_tasks_info across all JobData, then it enforces encapsulation. As Nick showed up, it's only ~1KB.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva edited a comment on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva edited a comment on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662529977


   @tonysun83 looks better! Good improvement. I made a few comments-in line.
   
   One part that I am not sure if it is worth starting two transactions per task item just to compute version stamps and maybe a few other fields. That seems quite expensive if, say, someone is polling status on a cluster with hundreds of tasks, given that jobs already have the dbs open and all that data available to start with. The reason we decided to do that was to ensure we are below 100KB for JobData, I think?
   
   We'd have to make a convincing case that our active_jobs status section in each JobData would have a chance of pushing it over the 100KB value limit. I tried encoding a large-ish object similar to the one we are adding for the task status:
   
   ```
   S = #{
       <<"database">> => << <<"x">> || _ <- lists:seq(1,256) >>,
       <<"changes_done">> => 999999999999999999999999999999,
       <<"design_document">> => << <<"d">> || _ <- lists:seq(1,512) >>,
       <<"current_version_stamp">> => <<"1111111111111111111111111111111111111111">>,
       <<"db_version_stamp">> => <<"222222222222222222222222222222222222222222222">>
   }.
   ```
   
   ```
   >> size(iolist_to_binary(jiffy:encode(S))).
   984
   ```
   
   So we'd be adding at most another 1KB (in the worst case) to get closer to the 100KB limit if we save the status in the job data that doesn't seem that bad perhaps and it would simplify the design quite a bit more.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459524216



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,27 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    {[
+        {<<"type">>, <<"indexer">>},
+        {<<"database">>, DbName},
+        {<<"changes_done">>, ChangesDone},
+        {<<"design_document">>, DDocId},
+        {<<"current_version_stamp">>, convert_seq_to_stamp(LastSeq)},
+        {<<"db_version_stamp">>, convert_seq_to_stamp(DBSeq)}
+    ]}.

Review comment:
       Minor nit, if you want this could be a map I think. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458907536



##########
File path: src/couch_views/src/couch_views_active_tasks.erl
##########
@@ -0,0 +1,87 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_active_tasks).
+
+
+-export([
+    populate_active_tasks/2
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+populate_active_tasks(<<"views">>, Data)  ->
+    #{
+        <<"db_name">> := DbName,
+        <<"ddoc_id">> := DDocId,
+        <<"changes_done">> := ChangesDone,
+        <<"db_uuid">> := DbUUID
+    } = Data,
+    
+    {ok, Db} = try
+        fabric2_db:open(DbName, [?ADMIN_CTX, {uuid, DbUUID}])
+    catch error:database_does_not_exist ->
+        {ok, not_found}
+    end,
+
+    {ok, DDoc} = case fabric2_db:open_doc(Db, DDocId) of
+        {ok, DDoc0} ->
+            {ok, DDoc0};
+        {not_found, _} ->
+            {ok, not_found}
+    end,

Review comment:
       oh, `view_seq` already exists so only really just need to add `db_seq`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459666937



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +562,4 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+

Review comment:
       Whitespace-only change, let's skip it if we can.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r459862301



##########
File path: src/chttpd/src/chttpd_misc.erl
##########
@@ -294,11 +294,8 @@ dbs_info_callback({error, Reason}, #vacc{resp = Resp0} = Acc) ->
 
 handle_task_status_req(#httpd{method='GET'}=Req) ->
     ok = chttpd:verify_is_server_admin(Req),
-    {Replies, _BadNodes} = gen_server:multi_call(couch_task_status, all),
-    Response = lists:flatmap(fun({Node, Tasks}) ->
-        [{[{node,Node} | Task]} || Task <- Tasks]
-    end, Replies),
-    send_json(Req, lists:sort(Response));
+    ActiveTasks = fabric2_active_tasks:get_active_tasks(),

Review comment:
       Awesome. This is great. 

##########
File path: src/couch_views/src/couch_views_util.erl
##########
@@ -276,3 +277,27 @@ is_paginated(#mrargs{page_size = PageSize}) when is_integer(PageSize) ->
 
 is_paginated(_) ->
     false.
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    #{
+        <<"type">> => <<"indexer">>,

Review comment:
       I still don't understand why we need this full active task map in jobs? The only values that we don't already store in the jobs data is `DbSeq` and `ChangesDone`. Couldn't you move those two fields into the job_data and then when you call `fabric2_active_tasks:get_active_tasks` create the full map to return to the active task info based off the job data. This map is a duplication of everything in the job data. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r457483744



##########
File path: src/chttpd/src/chttpd_misc.erl
##########
@@ -294,11 +296,25 @@ dbs_info_callback({error, Reason}, #vacc{resp = Resp0} = Acc) ->
 
 handle_task_status_req(#httpd{method='GET'}=Req) ->
     ok = chttpd:verify_is_server_admin(Req),
-    {Replies, _BadNodes} = gen_server:multi_call(couch_task_status, all),
-    Response = lists:flatmap(fun({Node, Tasks}) ->
-        [{[{node,Node} | Task]} || Task <- Tasks]
-    end, Replies),
-    send_json(Req, lists:sort(Response));
+    Jobs = lists:foldl(fun(Type, JobsAcc) ->
+        NewJobs = couch_jobs_fdb:get_active_jobs(Type),
+        maps:merge(NewJobs, JobsAcc)
+    end, #{}, ?ACTIVE_TASK_JOB_TYPES),
+    ActiveTasks = maps:map(fun(JobId, _) ->
+        fabric2_fdb:transactional(fun(Tx) ->

Review comment:
       We could move most of this to a fabric2, could be in fabric2_active_tasks or similar though we could also make it smaller if say we get the list of active jobs and their data in the same transaction say one per type. Then we know the data is there and don't have to check for `{error, _}`:
   
   ```
       maps:map(fun(Type) ->
           couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(undefined), fun(JTx) ->
               JobIds = couch_jobs:get_active_jobs(JTx, Type),
               lists:foldl(fun(TasksAcc) ->
                  {ok, Data} = couch_jobs:get_job_data(JTx, Type, JobId),
                  maps:merge(maps:get(<<"active_tasks_info">>, Data, #{}), TasksAcc)
              end, #{}, JobIds)
       end, ?ACTIVE_TASK_JOB_TYPES)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] davisp commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
davisp commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r454471106



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +556,25 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+update_tasks(NumChanges, false) ->
+    [Changes] = couch_task_status:get([changes_done]),
+    Changes2 = Changes + NumChanges,
+    couch_task_status:update([{changes_done, Changes2}]);
+
+update_tasks(NumChanges, LastSeq) ->
+    [Changes] = couch_task_status:get([changes_done]),
+    Changes2 = Changes + NumChanges,
+    VS = fabric2_fdb:seq_to_vs(LastSeq),
+    VS1 = convert_version_stamp(VS),
+    couch_task_status:update([{changes_done, Changes2},
+        {current_version_stamp, VS1}]).
+
+
+convert_version_stamp({_, Stamp, _, DocNumber}) ->

Review comment:
       The third tuple element here is the batch id which while generally 0 probably shouldn't be something we should leave out for consistency's sake.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-662582624


   @tonysun83:  Agree with @davisp, it's not worth having the registration system just for this.
   
   We could still hide the `<<"active_task_info">>` field name with a simple library function like 
   
   ```
   fabric2_active_tasks:update_active_task_info(JobData, ActiveTaskInfo) ->
        JobData#{?ACTIVE_TASK_INFO => ActiveTaskInfo}.
   ```
   
   And use the original design.
   
   >  I think @nickva thought that the registration callback would help with the sizing issue
   
   The registration would have helped with encapsulation, if we had a size issue and had to compute the section on the fly to avoid the http handle hard-coding knowledge about the replication and indexing apps.  But I think we determined computing on the fly was not as optimal, and it wasn't adding that much more to the job data size. If we ever do need to compute it on the fly, we could also come back add the complexity later.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458855210



##########
File path: src/couch_jobs/src/couch_jobs_fdb.erl
##########
@@ -33,6 +33,7 @@
 
     get_activity_vs/2,
     get_activity_vs_and_watch/2,
+    get_active_jobs/1,

Review comment:
       This is not needed anymore? We moves the function to `couch_jobs`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] garrensmith commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
garrensmith commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458259562



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,35 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    VS = case LastSeq of

Review comment:
       That sounds good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r457468173



##########
File path: src/couch_jobs/src/couch_jobs_fdb.erl
##########
@@ -355,6 +357,13 @@ get_activity_vs_and_watch(#{jtx := true} = JTx, Type) ->
     end.
 
 
+get_active_jobs(Type) ->

Review comment:
       Since we are calling it from the outside of couch_jobs let's add this function to couch_jobs module.  There it could look like:
   
   ```
   get_active_jobs(Tx, Type) ->
       couch_jobs_fdb:tx(couch_jobs_fdb:get_jtx(Tx), fun(JTx) ->
           couch_jobs_fdb:get_active_since(JTx, Type, {versionstamp, 0, 0}})
       end).
   ```
   
   `Tx` could then be `undefined`, a plain transaction from erldb or a TxDb one from `fabric2_db`
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458881273



##########
File path: src/couch_jobs/src/couch_jobs_fdb.erl
##########
@@ -355,6 +356,13 @@ get_activity_vs_and_watch(#{jtx := true} = JTx, Type) ->
     end.
 
 
+get_active_jobs(Type) ->
+    fabric2_fdb:transactional(fun(Tx) ->
+        JTx = init_jtx(Tx),                                                        
+        get_active_since(JTx, Type, {versionstamp, 0, 0})
+    end).
+
+

Review comment:
       ahhhh I was rebasing and somehow this came back.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r457491889



##########
File path: src/couch_views/src/couch_views_indexer.erl
##########
@@ -541,3 +559,35 @@ key_size_limit() ->
 
 value_size_limit() ->
     config:get_integer("couch_views", "value_size_limit", ?VALUE_SIZE_LIMIT).
+
+
+active_tasks_info(ChangesDone, DbName, DDocId, LastSeq, DBSeq) ->
+    VS = case LastSeq of

Review comment:
       The reason I had thought of it is that for replication tasks, for example, we we'd have to know to introspect a few other places specific to replicator to add "source", "target", etc. So active tasks would have to start knowing about how to extract fields from various applications. Perhaps we'd want a simple registration like we have for indices in fabric2_index: https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/fabric/src/fabric2_index.erl#L51-L53 - modules (applications) register a callback for _active_tasks handler and call it when building the response? Or, perhaps we can do that as round 2 if we see the values getting close to 100KB size.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
nickva commented on pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#issuecomment-658229486


   @tonysun83 
   
   Agree with @garrensmith, for reporting the job on all the nodes, it might be easier to think of something like this:
   
     * Having an `<<"active_task_status">>` section to each job's data section, already converted to json format.
   
     * In the http handler we would query all the active jobs and just emit those json objects. All active jobs are conveniently are located under a single `?ACTIVITY` subspace using [get_active_since/3] https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L358) (it was done that way on purpose to make it simple to query all active jobs in order to report _active_tasks easier 😉 ).
   
    * To get all the possible job types can use [get_types](https://github.com/apache/couchdb/blob/prototype/fdb-layer/src/couch_jobs/src/couch_jobs_fdb.erl#L321). 
   
    * Since we'd be calling those outside of couch_jobs we could add a new function to `couch_jobs`, something like `get_active_jobs` perhaps.
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3003: add active_tasks for view builds using version stamps

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3003:
URL: https://github.com/apache/couchdb/pull/3003#discussion_r458893984



##########
File path: src/couch_views/src/couch_views_active_tasks.erl
##########
@@ -0,0 +1,87 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+
+-module(couch_views_active_tasks).
+
+
+-export([
+    populate_active_tasks/2
+]).
+
+
+-include("couch_views.hrl").
+-include_lib("couch/include/couch_db.hrl").
+-include_lib("couch_mrview/include/couch_mrview.hrl").
+-include_lib("fabric/include/fabric2.hrl").
+
+
+populate_active_tasks(<<"views">>, Data)  ->
+    #{
+        <<"db_name">> := DbName,
+        <<"ddoc_id">> := DDocId,
+        <<"changes_done">> := ChangesDone,
+        <<"db_uuid">> := DbUUID
+    } = Data,
+    
+    {ok, Db} = try
+        fabric2_db:open(DbName, [?ADMIN_CTX, {uuid, DbUUID}])
+    catch error:database_does_not_exist ->
+        {ok, not_found}
+    end,
+
+    {ok, DDoc} = case fabric2_db:open_doc(Db, DDocId) of
+        {ok, DDoc0} ->
+            {ok, DDoc0};
+        {not_found, _} ->
+            {ok, not_found}
+    end,

Review comment:
       Seems we have two issues
   1) exceeding the 100 kb limit with JobData
   2) too many read requests when active_task processes exist.
   
   The only 3 things we need are really `changes_done`, `view_seq` and `db_seq`. Wonder if adding just those JobData would an okay compromise?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org