You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Biju Nair (JIRA)" <ji...@apache.org> on 2019/05/12 19:38:00 UTC

[jira] [Updated] (HBASE-22057) Impose upper-bound on size of ZK ops sent in a single multi()

     [ https://issues.apache.org/jira/browse/HBASE-22057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Biju Nair updated HBASE-22057:
------------------------------
    Labels: ZooKeeper  (was: )

> Impose upper-bound on size of ZK ops sent in a single multi()
> -------------------------------------------------------------
>
>                 Key: HBASE-22057
>                 URL: https://issues.apache.org/jira/browse/HBASE-22057
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
>              Labels: ZooKeeper
>             Fix For: 3.0.0, 1.5.0, 2.2.0
>
>         Attachments: HBASE-22057-branch-1.patch, HBASE-22057.001.patch, HBASE-22057.002.patch, HBASE-22057.003.patch, HBASE-22057.004.patch
>
>
> In {{ZKUtil#multiOrSequential}}, we accept a list of {{ZKUtilOp}}'s to pass down to the {{ZooKeeper#multi(Iterable<Op>)}} method.
> One problem with this approach is that we may generate a large list of ZNodes to mutate in one batch which exceeds the allowable client package length, specified by {{jute.maxbuffer}}.
> This problem can manifest when we have a large number of WALs to replicate, queued in ZooKeeper, from a disabled peer. When that peer is dropped, the RS would submit deletes of those queued WALs. The RS will see ConnectionLoss for the resulting {{multi()}} calls it tries to make, because we are sending too large of a client message (because we're trying to delete too many WALs at once). The result (at least in branch-1 ish versions) is that the RS aborts after exceeding the ZK retries (as this operation will never succeed).
> A simple fix would be to impose a maximum number of Ops to run in a single batch inside ZKUtil, and split apart the caller-submitted batch into smaller chunks. Before we make such a change, I do need to make sure that we don't have any expectations on atomicity of the operations. I'm not sure what ZK provides here -- for the above example, splitting up batches of deletes is not an issue, but there could be issues with batches of creates where we only apply some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)