You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2016/04/02 00:36:25 UTC

[jira] [Resolved] (KUDU-1073) Single TS falling too far behind hung YCSB

     [ https://issues.apache.org/jira/browse/KUDU-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved KUDU-1073.
--------------------------------------
          Resolution: Cannot Reproduce
       Fix Version/s: n/a
    Target Version/s:   (was: GA)

Haven't seen that in ages.

> Single TS falling too far behind hung YCSB
> ------------------------------------------
>
>                 Key: KUDU-1073
>                 URL: https://issues.apache.org/jira/browse/KUDU-1073
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, consensus
>    Affects Versions: Private Beta
>            Reporter: Todd Lipcon
>            Assignee: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: n/a
>
>
> This caused a YCSB job to fail:
> - a server fell behind for some reason (haven't done root cause on why -- maybe just a bit slow)
> - leader GCed the logs needed to catch it up, and thus stopped sending it any heartbeats or other messages
> - the server had one write pending
> - the java client apparently just kept retrying over and over against the same server
> The server with the pending txn may actually have been the leader at the time it was written - otherwise not sure why Java keeps retrying it. Or perhaps the Java client got an error on the leader, failed over to try the follower, and RPCs to the follower are timing out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)