You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Frans Lawaetz <fl...@gmail.com> on 2014/02/24 18:32:00 UTC

setting zookeeper forceSync=no

Hi-

Acknowledging in advance that what I'm asking goes against best practices
as described here and by the ZooKeeper guides as well..  I was wondering
what the possible consequences are to setting forceSync=no in zoo.cfg in
stand-alone installations where a single machine hosts accumulo, zookeeper,
Hadoop, etc.

This sort of configuration is obviously not for production and is used only
when a client is interested in seeing a demo of an accumulo-based
application but only has a single machine available at the time and often
with just a single drive serving all mounted file systems.  As one might
expect in this sort of setup the zookeeper log starts to populate with:

zookeeper.log.9:2014-01-21 19:19:38,885 [myid:] - WARN
 [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in
SyncThread:0 took 5898ms which will adversely effect operation latency. See
the ZooKeeper troubleshooting guide

Eventually Accumulo will time out with a ConnectionLoss and the master
process will go down.

Is Accumulo's use of zookeeper primarily for cluster-wide synchronization
during run-time or is there persistent stateful data that must be kept in
sync with the contents of walogs and/or table files in HDFS?

If the former then I imagine (in a stand-alone setup) that zookeeper
corruption due to incomplete syncs during a power failure or the like could
be remedied by a restart of the stack which would recover a prior zookeeper
snapshot.  If it's the latter then I can see things getting a bit messy.

Thanks in advance.

Frans



--

Re: setting zookeeper forceSync=no

Posted by Eric Newton <er...@gmail.com>.

zookeeper is used to find the root table(t), and the WALogs for the root
tablet.  Recovering the root tablet with old WALogs would corrupt the
!METADATA table.

In addition, the distributed locks are used to maintain the right of any
tablet server to host the tablets assigned to it.  A restart of zookeeper
that could lose sessions will drop the locks, causing the tablet servers to
stop.

-Eric



On Mon, Feb 24, 2014 at 12:32 PM, Frans Lawaetz <fl...@gmail.com> wrote:

> Hi-
>
> Acknowledging in advance that what I'm asking goes against best practices
> as described here and by the ZooKeeper guides as well..  I was wondering
> what the possible consequences are to setting forceSync=no in zoo.cfg in
> stand-alone installations where a single machine hosts accumulo, zookeeper,
> Hadoop, etc.
>
> This sort of configuration is obviously not for production and is used
> only when a client is interested in seeing a demo of an accumulo-based
> application but only has a single machine available at the time and often
> with just a single drive serving all mounted file systems.  As one might
> expect in this sort of setup the zookeeper log starts to populate with:
>
> zookeeper.log.9:2014-01-21 19:19:38,885 [myid:] - WARN
>  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in
> SyncThread:0 took 5898ms which will adversely effect operation latency. See
> the ZooKeeper troubleshooting guide
>
> Eventually Accumulo will time out with a ConnectionLoss and the master
> process will go down.
>
> Is Accumulo's use of zookeeper primarily for cluster-wide synchronization
> during run-time or is there persistent stateful data that must be kept in
> sync with the contents of walogs and/or table files in HDFS?
>
> If the former then I imagine (in a stand-alone setup) that zookeeper
> corruption due to incomplete syncs during a power failure or the like could
> be remedied by a restart of the stack which would recover a prior zookeeper
> snapshot.  If it's the latter then I can see things getting a bit messy.
>
> Thanks in advance.
>
> Frans
>
>
>
> --
>
>