You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "David Alves (JIRA)" <ji...@apache.org> on 2016/12/08 16:31:58 UTC

[jira] [Resolved] (KUDU-1127) Avoid holding RPC handler threads on replicas that are part of a degraded tablet

     [ https://issues.apache.org/jira/browse/KUDU-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Alves resolved KUDU-1127.
-------------------------------
       Resolution: Fixed
         Assignee: David Alves
    Fix Version/s: 1.2.0

4d8fe6cf2a1804bae142ddfb5e672af37dad036e did quite a bit in this regard like not hanging threads more than a fixed amount and short circuiting the wait. We might making it async in the future but we can open a new ticket for that.

> Avoid holding RPC handler threads on replicas that are part of a degraded tablet
> --------------------------------------------------------------------------------
>
>                 Key: KUDU-1127
>                 URL: https://issues.apache.org/jira/browse/KUDU-1127
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: tserver
>    Affects Versions: Private Beta
>            Reporter: Todd Lipcon
>            Assignee: David Alves
>             Fix For: 1.2.0
>
>
> If the client performs a snapshot scan, we may need to wait for the leader to tell us that the timestamp is "safe". If the majority of nodes in a tablet are down, this will never happen. After KUDU-689, well wait with a deadline, but even this multi-second wait will end up blocking a lot of RPC handlers, potentially preventing other useful work from getting done.
> We should probably short-circuit the wait in the case that we haven't heard from any leader within the election timeout and just respond immediately. Alternatively, we could make this an async callback vs a blocking wait on handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)