You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by James Srinivasan <ja...@gmail.com> on 2022/02/26 16:24:44 UTC

Accumulo/Zookeeper issue when root table has *lots* of WALs in zk

For the benefit of Google and/or future me, and with huge thanks to Ed
Coleman, here’s a quick summary of an issue we hit with Accumulo 1.7.0 and
the fix. Details are in Slack but with a few red herrings (thanks to me).
Some of this is fat-fingered so apologies for any typos:



We recently needed to bounce our moderate (19 node) cluster (log4j on other
stuff), but Accumulo failed to restart. Four of the nodes had been down for
some time (root cause unknown).



Symptoms



1) Accumulo monitor showed list of tables but "-" against every entry

2) Accumulo files looked ok in HDFS

3) scan -t accumulo.root (debug on) in accumulo shell gave “Failed to
locate tablet for table : +r row :”

4) There were some Zookeeper warnings in some logs (I forget
precisely which) but they weren't hugely informative - ConnectionLoss
for /accumulo/{uuid}/root_tablet/walogs. This turns out to be critical, but
I didn't realise it at the time.

5) Zookeeper nodes showed that a tserver should host the root tablet
(/accumulo/{id}/root_tablet/location), but that tserver did not have a lock
for the root tablet
(/accumulo/{id}/tservers/mytservername.domain:9997/zlock-00000000)

6) Using the zookeeper cli, ls /accumulo/{id}/root_tablet/walogs bombed out
with familiar looking ConnectionLoss, although with some more helpful info
"Packet len is out of range"


Cause


Zookeeper clients (CLI or Accumulo tserver) are failing to list znode with
large numbers of children due to insufficient buffer space. See the docs on
jute.maxbuffer here -
https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#Unsafe+Options

Quite why there were so many children of the walogs node is unknown, but
may have been due to the four inactive tservers


Fix


Set "-Djute.maxbuffer=big_value" for all Accumulo processes seemed to fix
things. For me, big_value was around 8000000 (i.e. 8MB). Accumulo came back
slowly, found all its data files and then the number of children of the zk
walogs node dropped substantially.

Re: Accumulo/Zookeeper issue when root table has *lots* of WALs in zk

Posted by Christopher <ct...@apache.org>.
Thanks for the write up! That's great that you were able to get things
back up and running. I was following your conversation in the Slack
channel. Hopefully, this will help others if they run into something
similar.

Also, just wanted to mention, since you said you were running 1.7.0,
that 1.7.0 is subject to CVE-2020-17533, as well as lots of other
bugs. At the very least, you should be able to upgrade to the latest
1.7 (1.7.4) as a drop-in, which will fix at least a few critical bugs,
such as at least one involving potential data loss. Ideally, though,
you should try to upgrade to 1.10.2, which is the latest (and only)
still-maintained 1.x version.

On Sat, Feb 26, 2022 at 11:25 AM James Srinivasan
<ja...@gmail.com> wrote:
>
> For the benefit of Google and/or future me, and with huge thanks to Ed Coleman, here’s a quick summary of an issue we hit with Accumulo 1.7.0 and the fix. Details are in Slack but with a few red herrings (thanks to me). Some of this is fat-fingered so apologies for any typos:
>
>
>
> We recently needed to bounce our moderate (19 node) cluster (log4j on other stuff), but Accumulo failed to restart. Four of the nodes had been down for some time (root cause unknown).
>
>
>
> Symptoms
>
>
>
> 1) Accumulo monitor showed list of tables but "-" against every entry
>
> 2) Accumulo files looked ok in HDFS
>
> 3) scan -t accumulo.root (debug on) in accumulo shell gave “Failed to locate tablet for table : +r row :”
>
> 4) There were some Zookeeper warnings in some logs (I forget precisely which) but they weren't hugely informative - ConnectionLoss for /accumulo/{uuid}/root_tablet/walogs. This turns out to be critical, but I didn't realise it at the time.
>
> 5) Zookeeper nodes showed that a tserver should host the root tablet (/accumulo/{id}/root_tablet/location), but that tserver did not have a lock for the root tablet (/accumulo/{id}/tservers/mytservername.domain:9997/zlock-00000000)
>
> 6) Using the zookeeper cli, ls /accumulo/{id}/root_tablet/walogs bombed out with familiar looking ConnectionLoss, although with some more helpful info "Packet len is out of range"
>
>
> Cause
>
>
> Zookeeper clients (CLI or Accumulo tserver) are failing to list znode with large numbers of children due to insufficient buffer space. See the docs on jute.maxbuffer here - https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#Unsafe+Options
>
> Quite why there were so many children of the walogs node is unknown, but may have been due to the four inactive tservers
>
>
> Fix
>
>
> Set "-Djute.maxbuffer=big_value" for all Accumulo processes seemed to fix things. For me, big_value was around 8000000 (i.e. 8MB). Accumulo came back slowly, found all its data files and then the number of children of the zk walogs node dropped substantially.
>
>