You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by va...@apache.org on 2021/04/06 14:25:00 UTC

[couchdb] branch fix-retry-handling-in-jobs-type-monitor created (now ca355eb)

This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a change to branch fix-retry-handling-in-jobs-type-monitor
in repository https://gitbox.apache.org/repos/asf/couchdb.git.


      at ca355eb  Retryable error fixes in couch_jobs_type_monitor

This branch includes the following new commits:

     new ca355eb  Retryable error fixes in couch_jobs_type_monitor

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


[couchdb] 01/01: Retryable error fixes in couch_jobs_type_monitor

Posted by va...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch fix-retry-handling-in-jobs-type-monitor
in repository https://gitbox.apache.org/repos/asf/couchdb.git

commit ca355ebaf9d8fa8afbc2da56b0ccee2e29964f13
Author: Nick Vatamaniuc <va...@gmail.com>
AuthorDate: Tue Apr 6 10:04:17 2021 -0400

    Retryable error fixes in couch_jobs_type_monitor
    
    This continues improvements to retryable error handling started in
    https://github.com/apache/couchdb/pull/3460. Here we add the same logic we
    already have for the `erlfdb:wait/2` call in
    https://github.com/apache/couchdb/blob/main/src/couch_jobs/src/couch_jobs_type_monitor.erl#L55-L57
    to the `get_vs_and_watch/1` section.
    
    couch_jobs_type_monitor is meant to be linked to and run in a continuous loop
    as long as the parent process is alive. If FDB becomes unavailable the main
    process which we linked to or other main component (the whole application)
    should crash and fail as opposed to the type monitor itself. Still, to avoid
    running in a tight loop we use the holdoff interval to sleep a bit before
    recursing. The typical values of the holdoff is around 50-100 msec.
---
 src/couch_jobs/src/couch_jobs_type_monitor.erl | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/src/couch_jobs/src/couch_jobs_type_monitor.erl b/src/couch_jobs/src/couch_jobs_type_monitor.erl
index a62eb62..b58f34e 100644
--- a/src/couch_jobs/src/couch_jobs_type_monitor.erl
+++ b/src/couch_jobs/src/couch_jobs_type_monitor.erl
@@ -81,7 +81,20 @@ notify(#st{} = St) ->
     St#st{timestamp = Now}.
 
 
-get_vs_and_watch(#st{jtx = JTx, type = Type}) ->
-    couch_jobs_fdb:tx(JTx, fun(JTx1) ->
-        couch_jobs_fdb:get_activity_vs_and_watch(JTx1, Type)
-    end).
+get_vs_and_watch(#st{} = St) ->
+    #st{jtx = JTx, type = Type, holdoff = HoldOff} = St,
+    try
+        couch_jobs_fdb:tx(JTx, fun(JTx1) ->
+            couch_jobs_fdb:get_activity_vs_and_watch(JTx1, Type)
+        end)
+    catch
+        error:{erlfdb_error, ?ERLFDB_TRANSACTION_TIMED_OUT} ->
+            timer:sleep(HoldOff),
+            get_vs_and_watch(St);
+        error:{erlfdb_error, Code} when ?ERLFDB_IS_RETRYABLE(Code) ->
+            timer:sleep(HoldOff),
+            get_vs_and_watch(St);
+        error:{timeout, _} ->
+            timer:sleep(HoldOff),
+            get_vs_and_watch(St)
+    end.