You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Hendrik Haddorp <he...@gmx.net> on 2017/01/24 21:06:45 UTC

recover failed node

Hi,

I assume this is quite a standard issue but I failed to find a solution 
so far. I have a 3 node ZooKeeper 3.4.6 ensemble and one node lost all 
its data. My assumption was that when the node comes up again ZooKeeper 
would send over the state from the remaining nodes to reinitialize it 
but that does not seem to happen. So what can I do to recover my node 
without changing the two left nodes? I tried to copy the snapshots and 
logs from one node but that did not work so far.

thanks,
Hendrik

Re: recover failed node

Posted by Hendrik Haddorp <he...@gmx.net>.
Hi Ben,
my setup is running on docker. The work directory is mounted as a docker 
volume and that got lost. Just the config was left. Given that all ports 
and host names did not change I actually did not expect any 
communication problems. But looking into the logs again as you suggested 
I actually found that the healthy node could not reach the node that had 
failed. We actually had an addition problem with the docker host of that 
machine, which is also why the volume was lost, and it looks like the 
DNS lookup had a problem. So after I restarted one of the good nodes 
ZooKeeper recovered now again and all nodes are good again :-)

thanks,
Hendrik

On 25.01.2017 01:34, Ben Sherman wrote:
> Do you know why the node lost its data?  Are your configuration files
> correct?  Is is trying to join the ensemble?  Are there any mentions of the
> broken node trying to reach the good nodes in the good nodes' logs?
>
> On Tue, Jan 24, 2017 at 1:06 PM, Hendrik Haddorp <he...@gmx.net>
> wrote:
>
>> Hi,
>>
>> I assume this is quite a standard issue but I failed to find a solution so
>> far. I have a 3 node ZooKeeper 3.4.6 ensemble and one node lost all its
>> data. My assumption was that when the node comes up again ZooKeeper would
>> send over the state from the remaining nodes to reinitialize it but that
>> does not seem to happen. So what can I do to recover my node without
>> changing the two left nodes? I tried to copy the snapshots and logs from
>> one node but that did not work so far.
>>
>> thanks,
>> Hendrik
>>


Re: recover failed node

Posted by Ben Sherman <be...@gmail.com>.
Do you know why the node lost its data?  Are your configuration files
correct?  Is is trying to join the ensemble?  Are there any mentions of the
broken node trying to reach the good nodes in the good nodes' logs?

On Tue, Jan 24, 2017 at 1:06 PM, Hendrik Haddorp <he...@gmx.net>
wrote:

> Hi,
>
> I assume this is quite a standard issue but I failed to find a solution so
> far. I have a 3 node ZooKeeper 3.4.6 ensemble and one node lost all its
> data. My assumption was that when the node comes up again ZooKeeper would
> send over the state from the remaining nodes to reinitialize it but that
> does not seem to happen. So what can I do to recover my node without
> changing the two left nodes? I tried to copy the snapshots and logs from
> one node but that did not work so far.
>
> thanks,
> Hendrik
>