You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ab...@apache.org on 2019/01/09 18:11:11 UTC

[kudu] 02/04: Reduce severity of "Error trying to read ahead of the log" log message

This is an automated email from the ASF dual-hosted git repository.

abukor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 6103f8786358df1766ba41c8d8dc2feb9c8c6f53
Author: Will Berkeley <wd...@gmail.org>
AuthorDate: Tue Jan 8 15:37:57 2019 -0800

    Reduce severity of "Error trying to read ahead of the log" log message
    
    In clusters under load, it is typical to see the ERROR log dominated by
    messages like the following:
    
    E1221 09:09:13.869918 143384 consensus_queue.cc:440] T 1d030514317942ec9d7796a8a029dace P a4eea0affa4545879c8988b24d8cb2bb [LEADER]: Error trying to read ahead of the log while preparing peer request: Incomplete: Op with index 6620 is ahead of the local log (next sequential op: 6620). Destination peer: Peer: c0a2e34b708845efb8f090459815272c, Is new: false, Last received: 2995.6619, Next index: 6621, Last known committed idx: 6620, Last exchange result: ERROR, Needs tablet copy: false
    
    This message is logged at the error level, and looks a little bit scary,
    but it is in fact harmless. Here's what happens:
    
    1. A leader neglects its duties and the followers elect a new leader.
    2. The new leader manages to replicate more ops (usually just the NO_OP
       asserting leadership).
    3. The leader of the previous term tries to replicate an op to a peer in
       the new term.
    4. From the response, it founds out that
       a. It is in an earlier term, so it should step down and increment its
          term.
       b. The last op the peer saw is (leader's index + k) for some k > 0
          (usually k = 1). So the leader will attempt to send up ops of
          index up through (leader's index + k).
    5. The term change is asynchronous, and before it happens the leader
       tries to prepare a new request to the peer whose log is ahead of the
       local log. This causes the ERROR message.
    6. The term change happens. The leader steps down, and life goes on.
    
    Note that the leader should also have received a VoteRequest with the
    new term and an UpdateConsensus with the new term from the leader. In
    general, this message appears only when the leader is under enough
    stress to lose its leadership and be unable to service some consensus
    RPCs in a timely manner. It is not in an of itself a problem, and it's a
    decent indicator of stress on the leader, so I am leaving the message
    but downgrading it to INFO level.
    
    See KUDU-1078 for more information about this situation, especially its
    previous causes (which were at one time actually harmful).
    
    Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
    Reviewed-on: http://gerrit.cloudera.org:8080/12185
    Reviewed-by: Adar Dembo <ad...@cloudera.com>
    Reviewed-by: Mike Percy <mp...@apache.org>
    Tested-by: Will Berkeley <wd...@gmail.com>
---
 src/kudu/consensus/consensus_queue.cc | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/kudu/consensus/consensus_queue.cc b/src/kudu/consensus/consensus_queue.cc
index 2c64e1c..c618002 100644
--- a/src/kudu/consensus/consensus_queue.cc
+++ b/src/kudu/consensus/consensus_queue.cc
@@ -696,11 +696,12 @@ Status PeerMessageQueue::RequestForPeer(const string& uuid,
       }
       if (s.IsIncomplete()) {
         // IsIncomplete() means that we tried to read beyond the head of the log
-        // (in the future). See KUDU-1078.
-        LOG_WITH_PREFIX_UNLOCKED(ERROR) << "Error trying to read ahead of the log "
-                                        << "while preparing peer request: "
-                                        << s.ToString() << ". Destination peer: "
-                                        << peer_copy.ToString();
+        // (in the future). This is usually a sign that this peer is under load
+        // and is about to step down as leader. See KUDU-1078.
+        LOG_WITH_PREFIX_UNLOCKED(INFO) << "Error trying to read ahead of the log "
+                                       << "while preparing peer request: "
+                                       << s.ToString() << ". Destination peer: "
+                                       << peer_copy.ToString();
         return s;
       }
       LOG_WITH_PREFIX_UNLOCKED(FATAL) << "Error reading the log while preparing peer request: "