You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/04/02 03:05:06 UTC

[kudu-CR](branch-0.8.x) KUDU-1387. Fix a case where the scanner tight-loops and then sleeps too long

Todd Lipcon has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/2709

Change subject: KUDU-1387. Fix a case where the scanner tight-loops and then sleeps too long
......................................................................

KUDU-1387. Fix a case where the scanner tight-loops and then sleeps too long

This prevents the following issue:

- the leader is down and election has not yet been triggered
- the scanner tries to hit the leader, and gets 'connection refused', and thus
  marks it as down, then goes back to the scanner retry loop
- in the tablet lookup path, RemoteTablet::HasLeader() returns false because
  the leader is known to be down. This causes the client to fetch new locations.
- fetching new locations marks the server as up again. This logic is dubious,
  but will be more complicated to address.
- because the server is now seen as "up" again, we just retry on the same server.

The patch fixes the scanner code so that, when a tablet server is down, it is added
to the scan's blacklist in addition to marking the server as down client-wide.
This makes the scanner code realize that all eligible servers are blacklisted and
trigger a sleep and backoff before retrying.

Without this patch, linked_list-test timed out a few percent of the time
in RELEASE builds. With the patch, it passed 200/200 times. I also noticed
that an existing test in client-test was triggering the tight retries, but
didn't have any assertion to detect the problematic number of RPCs.

Change-Id: I3cb3afa81cd6f75756c328b6ffe23a385f4b172d
Reviewed-on: http://gerrit.cloudera.org:8080/2699
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
(cherry picked from commit 563313d15e922db4255736ed1423bb418bbcd6fd)
---
M src/kudu/client/client-test.cc
M src/kudu/client/scanner-internal.cc
2 files changed, 34 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/2709/1
-- 
To view, visit http://gerrit.cloudera.org:8080/2709
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3cb3afa81cd6f75756c328b6ffe23a385f4b172d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-0.8.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>