You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Hendrik Haddorp <he...@gmx.net> on 2017/01/24 21:06:45 UTC
recover failed node
Hi,
I assume this is quite a standard issue but I failed to find a solution
so far. I have a 3 node ZooKeeper 3.4.6 ensemble and one node lost all
its data. My assumption was that when the node comes up again ZooKeeper
would send over the state from the remaining nodes to reinitialize it
but that does not seem to happen. So what can I do to recover my node
without changing the two left nodes? I tried to copy the snapshots and
logs from one node but that did not work so far.
thanks,
Hendrik
Re: recover failed node
Posted by Hendrik Haddorp <he...@gmx.net>.
Hi Ben,
my setup is running on docker. The work directory is mounted as a docker
volume and that got lost. Just the config was left. Given that all ports
and host names did not change I actually did not expect any
communication problems. But looking into the logs again as you suggested
I actually found that the healthy node could not reach the node that had
failed. We actually had an addition problem with the docker host of that
machine, which is also why the volume was lost, and it looks like the
DNS lookup had a problem. So after I restarted one of the good nodes
ZooKeeper recovered now again and all nodes are good again :-)
thanks,
Hendrik
On 25.01.2017 01:34, Ben Sherman wrote:
> Do you know why the node lost its data? Are your configuration files
> correct? Is is trying to join the ensemble? Are there any mentions of the
> broken node trying to reach the good nodes in the good nodes' logs?
>
> On Tue, Jan 24, 2017 at 1:06 PM, Hendrik Haddorp <he...@gmx.net>
> wrote:
>
>> Hi,
>>
>> I assume this is quite a standard issue but I failed to find a solution so
>> far. I have a 3 node ZooKeeper 3.4.6 ensemble and one node lost all its
>> data. My assumption was that when the node comes up again ZooKeeper would
>> send over the state from the remaining nodes to reinitialize it but that
>> does not seem to happen. So what can I do to recover my node without
>> changing the two left nodes? I tried to copy the snapshots and logs from
>> one node but that did not work so far.
>>
>> thanks,
>> Hendrik
>>
Re: recover failed node
Posted by Ben Sherman <be...@gmail.com>.
Do you know why the node lost its data? Are your configuration files
correct? Is is trying to join the ensemble? Are there any mentions of the
broken node trying to reach the good nodes in the good nodes' logs?
On Tue, Jan 24, 2017 at 1:06 PM, Hendrik Haddorp <he...@gmx.net>
wrote:
> Hi,
>
> I assume this is quite a standard issue but I failed to find a solution so
> far. I have a 3 node ZooKeeper 3.4.6 ensemble and one node lost all its
> data. My assumption was that when the node comes up again ZooKeeper would
> send over the state from the remaining nodes to reinitialize it but that
> does not seem to happen. So what can I do to recover my node without
> changing the two left nodes? I tried to copy the snapshots and logs from
> one node but that did not work so far.
>
> thanks,
> Hendrik
>