You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/01/08 23:51:26 UTC

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12185


Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................

Reduce severity of "Error trying to read ahead of the log" log message

In clusters under load, it is typical to see the ERROR log dominated by
messages like the following:

E1221 09:09:13.869918 143384 consensus_queue.cc:440] T 1d030514317942ec9d7796a8a029dace P a4eea0affa4545879c8988b24d8cb2bb [LEADER]: Error trying to read ahead of the log while preparing peer request: Incomplete: Op with index 6620 is ahead of the local log (next sequential op: 6620). Destination peer: Peer: c0a2e34b708845efb8f090459815272c, Is new: false, Last received: 2995.6619, Next index: 6621, Last known committed idx: 6620, Last exchange result: ERROR, Needs tablet copy: false

This message is logged at the error level, and looks a little bit scary,
but it is in fact harmless. Here's what happens:

1. A leader neglects its duties and the followers elect a new leader.
2. The new leader manages to replicate more ops (usually just the NO_OP
   asserting leadership).
3. The leader of the previous term tries to replicate an op to a peer in
   the new term.
4. From the response, it founds out that
   a. It is in an earlier term, so it should step down and increment its
      term.
   b. The next op the peer expects is (leader's index + k) for some k >
   0 (usually k = 1).
5. The term change is asynchronous, and before it happens the leader
   tries to prepare a new request to the peer whose log is ahead of the
   local log. This causes the ERROR message.
6. The term change happens. The leader steps down, and life goes on.

Note that the leader should also have received a VoteRequest with the
new term and an UpdateConsensus with the new term from the leader. In
general, this message appears only when the leader is under enough
stress to lose its leadership and be unable to service some consensus
RPCs in a timely manner. It is not in an of itself a problem, and it's a
decent indicator of stress on the leader, so I am leaving the message
but downgrading it to INFO level.

See KUDU-1078 for more information about this situation, especially its
previous causes (which were at one time actually harmful).

Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
---
M src/kudu/consensus/consensus_queue.cc
1 file changed, 6 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/12185/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12185 )

Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 09 Jan 2019 02:44:38 +0000
Gerrit-HasComments: No

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12185 )

Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 09 Jan 2019 00:23:59 +0000
Gerrit-HasComments: No

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12185 )

Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................


Patch Set 2: Verified+1

Saw what looks like a flake in write_throttling-itest. I filed KUDU-2658 for it. I also saw a failure in all_types-scan-correctness-test but there were no artifacts available to investigate. They're definitely unrelated to this patch, so I'm manually +1'ing.


-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 09 Jan 2019 09:16:24 +0000
Gerrit-HasComments: No

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12185

to look at the new patch set (#2).

Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................

Reduce severity of "Error trying to read ahead of the log" log message

In clusters under load, it is typical to see the ERROR log dominated by
messages like the following:

E1221 09:09:13.869918 143384 consensus_queue.cc:440] T 1d030514317942ec9d7796a8a029dace P a4eea0affa4545879c8988b24d8cb2bb [LEADER]: Error trying to read ahead of the log while preparing peer request: Incomplete: Op with index 6620 is ahead of the local log (next sequential op: 6620). Destination peer: Peer: c0a2e34b708845efb8f090459815272c, Is new: false, Last received: 2995.6619, Next index: 6621, Last known committed idx: 6620, Last exchange result: ERROR, Needs tablet copy: false

This message is logged at the error level, and looks a little bit scary,
but it is in fact harmless. Here's what happens:

1. A leader neglects its duties and the followers elect a new leader.
2. The new leader manages to replicate more ops (usually just the NO_OP
   asserting leadership).
3. The leader of the previous term tries to replicate an op to a peer in
   the new term.
4. From the response, it founds out that
   a. It is in an earlier term, so it should step down and increment its
      term.
   b. The last op the peer saw is (leader's index + k) for some k > 0
      (usually k = 1). So the leader will attempt to send up ops of
      index up through (leader's index + k).
5. The term change is asynchronous, and before it happens the leader
   tries to prepare a new request to the peer whose log is ahead of the
   local log. This causes the ERROR message.
6. The term change happens. The leader steps down, and life goes on.

Note that the leader should also have received a VoteRequest with the
new term and an UpdateConsensus with the new term from the leader. In
general, this message appears only when the leader is under enough
stress to lose its leadership and be unable to service some consensus
RPCs in a timely manner. It is not in an of itself a problem, and it's a
decent indicator of stress on the leader, so I am leaving the message
but downgrading it to INFO level.

See KUDU-1078 for more information about this situation, especially its
previous causes (which were at one time actually harmful).

Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
---
M src/kudu/consensus/consensus_queue.cc
1 file changed, 6 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/12185/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] Reduce severity of "Error trying to read ahead of the log" log message

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12185 )

Change subject: Reduce severity of "Error trying to read ahead of the log" log message
......................................................................

Reduce severity of "Error trying to read ahead of the log" log message

In clusters under load, it is typical to see the ERROR log dominated by
messages like the following:

E1221 09:09:13.869918 143384 consensus_queue.cc:440] T 1d030514317942ec9d7796a8a029dace P a4eea0affa4545879c8988b24d8cb2bb [LEADER]: Error trying to read ahead of the log while preparing peer request: Incomplete: Op with index 6620 is ahead of the local log (next sequential op: 6620). Destination peer: Peer: c0a2e34b708845efb8f090459815272c, Is new: false, Last received: 2995.6619, Next index: 6621, Last known committed idx: 6620, Last exchange result: ERROR, Needs tablet copy: false

This message is logged at the error level, and looks a little bit scary,
but it is in fact harmless. Here's what happens:

1. A leader neglects its duties and the followers elect a new leader.
2. The new leader manages to replicate more ops (usually just the NO_OP
   asserting leadership).
3. The leader of the previous term tries to replicate an op to a peer in
   the new term.
4. From the response, it founds out that
   a. It is in an earlier term, so it should step down and increment its
      term.
   b. The last op the peer saw is (leader's index + k) for some k > 0
      (usually k = 1). So the leader will attempt to send up ops of
      index up through (leader's index + k).
5. The term change is asynchronous, and before it happens the leader
   tries to prepare a new request to the peer whose log is ahead of the
   local log. This causes the ERROR message.
6. The term change happens. The leader steps down, and life goes on.

Note that the leader should also have received a VoteRequest with the
new term and an UpdateConsensus with the new term from the leader. In
general, this message appears only when the leader is under enough
stress to lose its leadership and be unable to service some consensus
RPCs in a timely manner. It is not in an of itself a problem, and it's a
decent indicator of stress on the leader, so I am leaving the message
but downgrading it to INFO level.

See KUDU-1078 for more information about this situation, especially its
previous causes (which were at one time actually harmful).

Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Reviewed-on: http://gerrit.cloudera.org:8080/12185
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/consensus/consensus_queue.cc
1 file changed, 6 insertions(+), 5 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Mike Percy: Looks good to me, approved
  Will Berkeley: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12185
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib841a52354e5c71450f3e364ab2fdee51ce2adb4
Gerrit-Change-Number: 12185
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>