You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Frans Lawaetz <fl...@gmail.com> on 2014/04/14 19:00:31 UTC

ZooKeeper ConnectionLoss in Accumulo 1.4.5

Hi-

I'm running a five-node Accumulo 1.4.5 cluster with zookeeper 3.4.6
distributed across the same systems.

We've seen a couple tserver failures in a manifestation that feels similar
to ACCUMULO-1572 (which was patched in 1.4.5).  What is perhaps unique in
this circumstance is that the user reported these failures occurring
immediately upon entering a command in the accumulo shell.  The commands
were a routine scan and delete.  The error is attached but boils down to:

2014-04-09 21:48:49,552 [zookeeper.ZooLock] WARN : lost connection to
> zookeeper
> 2014-04-09 21:48:49,552 [zookeeper.ZooCache] WARN : Zookeeper error, will
> retry
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /acc....
> 2014-04-09 21:48:49,554 [zookeeper.DistributedWorkQueue] INFO : Got
> unexpected zookeeper event: None
> [ repeat the above a few times and then finally ]
> 2014-04-09 21:48:51,866 [tabletserver.TabletServer] FATAL: Lost ability to
> monitor tablet server lock, exiting.


The zookeeper arrangement here is non-optimal in that they're working on
the same virtualized disk as the hadoop and accumulo processes.  The system
was performing bulk ingest at the time so contention was very likely an
issue.

Zookeeper did report, at essentially the same millisecond:

2014-04-09 21:48:49,551 [myid:1] - WARN
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
> java.net.SocketTimeoutException: Read timed out
> [ followed by a number of ]
> 2014-04-09 21:48:49,919 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
> session 0x0 due to java.io.IOException: ZooKeeperServer not running


It's important to note however that:

- The ZooKeeper errors above occur many other times in the logs and the
accumulo cluster has been ok.
- The ZooKeeper ensemble recovered without intervention.
- The WARN to FATAL time for Accumulo was just two seconds whereas I was
under the impression the process would only give up after two retry
attempts lasting 30s each.
- Only the tserver on the system where the user was running the accumulo
shell failed and only (we believe) upon issuance of a command.
- accumulo-site.xml on all nodes is configured with three zookeepers so the
system should be attempting to fail over.

Thanks,
Frans

Re: ZooKeeper ConnectionLoss in Accumulo 1.4.5

Posted by Eric Newton <er...@gmail.com>.
Make sure you add some limit to the New Generation size.  We have
"-XX:NewSize=500m -XX:MaxNewSize=500m " in the 3G version of
accumulo-env.sh.  You can go larger than 500m, but try to keep it
small (~1G).

Look for evidence of a stop-the-world java garbage collection.

1) look for "gc" lines in the tablet server logs:

 $ grep gc logs/tserver*.debug.log

You should see one line every second for a busy server.  Big delays
between gc lines are good evidence of a stop-the-world gc.

2) big leaps in gc

Looking at the GC lines, if you recover several gigabytes in one
collect, I have seen that just before zookeeper disconnects.  You can
reduce the -XX:CMSInitiatingOccupancyFraction=75 to something even
smaller.  I'm just guessing about this one... I suspect the OS is
putting us in swap.

3) Swap... maybe?

I know you have no swap, but the OS could be not giving you the pages
you want if they are unused.  Maybe this isn't possible: my low-level
understanding of how the page cache works is almost non-existent.
I've seen large, mostly idle tservers lose their locks while doing gc.
 This does not happen if we flush OS buffers periodically, ensuring
that free RAM is plentiful. Of course, this hurts performance of the
file system.

4) make sure you are using the native map.

-Eric



On Mon, Apr 14, 2014 at 2:59 PM, Frans Lawaetz <fl...@gmail.com> wrote:
> The system swappiness warning is a bit of a red herring in that the systems
> aren't configured with any swap space.  They all have 64GB RAM of which
> currently ~50GB is sitting as fs cache.  The load on these systems was very
> high during ingest so I'm sure there was IO latency even without swap use.
>
> In reviewing the log I see lots of promises about "will retry" (without the
> usual 250 or 500ms qualifier) for the various connections to ZK that are
> lost followed by a fatal event once the tablet server lock is lost.  It's
> not clear to me though that Accumulo does actually try to reconnect or fail
> over.
>
> Given that the other tservers stayed up, as well as the master, all of whom
> were configured to us the same ZK members, it would appear that there were
> functional ZK services available and that the failing tserver bailed
> prematurely.
>
> Beyond the ZK connection timeout parameter (set to 30s by default) are there
> other settings that can make accumulo more tolerant of ZK glitches?
>
>
>
> On Mon, Apr 14, 2014 at 1:11 PM, Sean Busbey <bu...@cloudera.com> wrote:
>>
>> The log looks like it is retrying the ZK connection issues but that it
>> independently lost the lock.
>>
>> The very start of the log claims you have vm.swappiness set to 60. Can you
>> zero this out and see if the issue still happens?
>>
>> Also, check to see if you're hitting swap once the user is running a shell
>> command on that host. If you start swapping the pauses will cause services
>> to lose their ZK locks.
>>
>>
>>
>>
>> On Mon, Apr 14, 2014 at 10:00 AM, Frans Lawaetz <fl...@gmail.com>
>> wrote:
>>>
>>>
>>> Hi-
>>>
>>> I'm running a five-node Accumulo 1.4.5 cluster with zookeeper 3.4.6
>>> distributed across the same systems.
>>>
>>> We've seen a couple tserver failures in a manifestation that feels
>>> similar to ACCUMULO-1572 (which was patched in 1.4.5).  What is perhaps
>>> unique in this circumstance is that the user reported these failures
>>> occurring immediately upon entering a command in the accumulo shell.  The
>>> commands were a routine scan and delete.  The error is attached but boils
>>> down to:
>>>
>>>> 2014-04-09 21:48:49,552 [zookeeper.ZooLock] WARN : lost connection to
>>>> zookeeper
>>>> 2014-04-09 21:48:49,552 [zookeeper.ZooCache] WARN : Zookeeper error,
>>>> will retry
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss for /acc....
>>>> 2014-04-09 21:48:49,554 [zookeeper.DistributedWorkQueue] INFO : Got
>>>> unexpected zookeeper event: None
>>>> [ repeat the above a few times and then finally ]
>>>> 2014-04-09 21:48:51,866 [tabletserver.TabletServer] FATAL: Lost ability
>>>> to monitor tablet server lock, exiting.
>>>
>>>
>>> The zookeeper arrangement here is non-optimal in that they're working on
>>> the same virtualized disk as the hadoop and accumulo processes.  The system
>>> was performing bulk ingest at the time so contention was very likely an
>>> issue.
>>>
>>> Zookeeper did report, at essentially the same millisecond:
>>>
>>>> 2014-04-09 21:48:49,551 [myid:1] - WARN
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>>> following the leader
>>>> java.net.SocketTimeoutException: Read timed out
>>>> [ followed by a number of ]
>>>> 2014-04-09 21:48:49,919 [myid:1] - WARN
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
>>>> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
>>>> running
>>>
>>>
>>> It's important to note however that:
>>>
>>> - The ZooKeeper errors above occur many other times in the logs and the
>>> accumulo cluster has been ok.
>>> - The ZooKeeper ensemble recovered without intervention.
>>> - The WARN to FATAL time for Accumulo was just two seconds whereas I was
>>> under the impression the process would only give up after two retry attempts
>>> lasting 30s each.
>>> - Only the tserver on the system where the user was running the accumulo
>>> shell failed and only (we believe) upon issuance of a command.
>>> - accumulo-site.xml on all nodes is configured with three zookeepers so
>>> the system should be attempting to fail over.
>>>
>>> Thanks,
>>> Frans
>>>
>>
>>
>>
>> --
>> Sean
>
>
>
>
> --
> Ph: 617.306.8083

Re: ZooKeeper ConnectionLoss in Accumulo 1.4.5

Posted by Frans Lawaetz <fl...@gmail.com>.
The system swappiness warning is a bit of a red herring in that the systems
aren't configured with any swap space.  They all have 64GB RAM of which
currently ~50GB is sitting as fs cache.  The load on these systems was very
high during ingest so I'm sure there was IO latency even without swap use.

In reviewing the log I see lots of promises about "will retry" (without the
usual 250 or 500ms qualifier) for the various connections to ZK that are
lost followed by a fatal event once the tablet server lock is lost.  It's
not clear to me though that Accumulo does actually try to reconnect or fail
over.

Given that the other tservers stayed up, as well as the master, all of whom
were configured to us the same ZK members, it would appear that there were
functional ZK services available and that the failing tserver bailed
prematurely.

Beyond the ZK connection timeout parameter (set to 30s by default) are
there other settings that can make accumulo more tolerant of ZK glitches?



On Mon, Apr 14, 2014 at 1:11 PM, Sean Busbey <bu...@cloudera.com> wrote:

> The log looks like it is retrying the ZK connection issues but that it
> independently lost the lock.
>
> The very start of the log claims you have vm.swappiness set to 60. Can you
> zero this out and see if the issue still happens?
>
> Also, check to see if you're hitting swap once the user is running a shell
> command on that host. If you start swapping the pauses will cause services
> to lose their ZK locks.
>
>
>
>
> On Mon, Apr 14, 2014 at 10:00 AM, Frans Lawaetz <fl...@gmail.com>wrote:
>
>>
>> Hi-
>>
>> I'm running a five-node Accumulo 1.4.5 cluster with zookeeper 3.4.6
>> distributed across the same systems.
>>
>> We've seen a couple tserver failures in a manifestation that feels
>> similar to ACCUMULO-1572 (which was patched in 1.4.5).  What is perhaps
>> unique in this circumstance is that the user reported these failures
>> occurring immediately upon entering a command in the accumulo shell.  The
>> commands were a routine scan and delete.  The error is attached but boils
>> down to:
>>
>> 2014-04-09 21:48:49,552 [zookeeper.ZooLock] WARN : lost connection to
>>> zookeeper
>>> 2014-04-09 21:48:49,552 [zookeeper.ZooCache] WARN : Zookeeper error,
>>> will retry
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /acc....
>>> 2014-04-09 21:48:49,554 [zookeeper.DistributedWorkQueue] INFO : Got
>>> unexpected zookeeper event: None
>>> [ repeat the above a few times and then finally ]
>>> 2014-04-09 21:48:51,866 [tabletserver.TabletServer] FATAL: Lost ability
>>> to monitor tablet server lock, exiting.
>>
>>
>> The zookeeper arrangement here is non-optimal in that they're working on
>> the same virtualized disk as the hadoop and accumulo processes.  The system
>> was performing bulk ingest at the time so contention was very likely an
>> issue.
>>
>> Zookeeper did report, at essentially the same millisecond:
>>
>> 2014-04-09 21:48:49,551 [myid:1] - WARN
>>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>> following the leader
>>> java.net.SocketTimeoutException: Read timed out
>>> [ followed by a number of ]
>>> 2014-04-09 21:48:49,919 [myid:1] - WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
>>> session 0x0 due to java.io.IOException: ZooKeeperServer not running
>>
>>
>> It's important to note however that:
>>
>> - The ZooKeeper errors above occur many other times in the logs and the
>> accumulo cluster has been ok.
>> - The ZooKeeper ensemble recovered without intervention.
>> - The WARN to FATAL time for Accumulo was just two seconds whereas I was
>> under the impression the process would only give up after two retry
>> attempts lasting 30s each.
>> - Only the tserver on the system where the user was running the accumulo
>> shell failed and only (we believe) upon issuance of a command.
>> - accumulo-site.xml on all nodes is configured with three zookeepers so
>> the system should be attempting to fail over.
>>
>> Thanks,
>> Frans
>>
>>
>
>
> --
> Sean
>



-- 
Ph: 617.306.8083

Re: ZooKeeper ConnectionLoss in Accumulo 1.4.5

Posted by Sean Busbey <bu...@cloudera.com>.
The log looks like it is retrying the ZK connection issues but that it
independently lost the lock.

The very start of the log claims you have vm.swappiness set to 60. Can you
zero this out and see if the issue still happens?

Also, check to see if you're hitting swap once the user is running a shell
command on that host. If you start swapping the pauses will cause services
to lose their ZK locks.




On Mon, Apr 14, 2014 at 10:00 AM, Frans Lawaetz <fl...@gmail.com> wrote:

>
> Hi-
>
> I'm running a five-node Accumulo 1.4.5 cluster with zookeeper 3.4.6
> distributed across the same systems.
>
> We've seen a couple tserver failures in a manifestation that feels similar
> to ACCUMULO-1572 (which was patched in 1.4.5).  What is perhaps unique in
> this circumstance is that the user reported these failures occurring
> immediately upon entering a command in the accumulo shell.  The commands
> were a routine scan and delete.  The error is attached but boils down to:
>
> 2014-04-09 21:48:49,552 [zookeeper.ZooLock] WARN : lost connection to
>> zookeeper
>> 2014-04-09 21:48:49,552 [zookeeper.ZooCache] WARN : Zookeeper error, will
>> retry
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /acc....
>> 2014-04-09 21:48:49,554 [zookeeper.DistributedWorkQueue] INFO : Got
>> unexpected zookeeper event: None
>> [ repeat the above a few times and then finally ]
>> 2014-04-09 21:48:51,866 [tabletserver.TabletServer] FATAL: Lost ability
>> to monitor tablet server lock, exiting.
>
>
> The zookeeper arrangement here is non-optimal in that they're working on
> the same virtualized disk as the hadoop and accumulo processes.  The system
> was performing bulk ingest at the time so contention was very likely an
> issue.
>
> Zookeeper did report, at essentially the same millisecond:
>
> 2014-04-09 21:48:49,551 [myid:1] - WARN
>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>> following the leader
>> java.net.SocketTimeoutException: Read timed out
>> [ followed by a number of ]
>> 2014-04-09 21:48:49,919 [myid:1] - WARN  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
>> session 0x0 due to java.io.IOException: ZooKeeperServer not running
>
>
> It's important to note however that:
>
> - The ZooKeeper errors above occur many other times in the logs and the
> accumulo cluster has been ok.
> - The ZooKeeper ensemble recovered without intervention.
> - The WARN to FATAL time for Accumulo was just two seconds whereas I was
> under the impression the process would only give up after two retry
> attempts lasting 30s each.
> - Only the tserver on the system where the user was running the accumulo
> shell failed and only (we believe) upon issuance of a command.
> - accumulo-site.xml on all nodes is configured with three zookeepers so
> the system should be attempting to fail over.
>
> Thanks,
> Frans
>
>


-- 
Sean