You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/06/12 04:25:00 UTC

[jira] [Created] (HBASE-24544) Recommend upping zk jute.maxbuffer in all but minor installs

Michael Stack created HBASE-24544:
-------------------------------------

Summary: Recommend upping zk jute.maxbuffer in all but minor installs
Key: HBASE-24544
URL: https://issues.apache.org/jira/browse/HBASE-24544
Project: HBase
Issue Type: Bug
Components: documentation
Reporter: Michael Stack

Add a doc note in upgrade and in zookeeper section recommending upping zk jute.maxbuffer to be above the default of 1M.

Here is jute.maxbuffer from zk doc.

{code}
jute.maxbuffer:
(Java system property: jute.maxbuffer)
This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xfffff, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size.
{code}

It seems easy enough blowing the 1MB default. Here is one such scenario. A peer is disabled so WALs backup on each RegionServer or a bug makes it so we don't clear WALs out from under the RegionServer promptly. Backed-up WALs get into the hundreds... easy enough on a busy cluster. Next, there is a power outage and the cluster crashes down.

Recovery may require an SCP recovering hundreds of WALs. As is, the way our SCP works, we can end up with a /hbase/splitWAL dir with hundreds -- even thousands -- of WALs in it. The 1MB buffer limit in zk can't carry listings this big.

Of note, the jute.maxbuffer needs to be set on the zk servers -- with restart so the change is noticed -- and on the client-side, in the hbase master at least.

This issue is about highlighting this old issue in our doc. It seems to be absent totally.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)