You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by German Blanco <ge...@gmail.com> on 2013/10/01 08:14:18 UTC

Re: avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Hello again,
unfortunately, it doesn't seem to be a silly mistake.
I have dumped the snapshots using SnapshotFormatter and everything in the
three snapshots matches, except for the missining info in one of the three
servers.
I have reopened ZOOKEEPER-1449 because of this. I hope it is possible to
figure out what happened.
Thanks for your help,
Germán.


On Sat, Sep 28, 2013 at 8:33 AM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Thank you, I will check, I must admit I haven't search for this enough :-(
> I have also noticed that there is a development option to force
> synchronisation via snapshot for 3.5.0. That should avoid the problem.
> There is anyway something strange that I have noticed. There were two
> nodes showing up in one of the followers of the ensemble (3.4.5 with 3
> nodes) that were not there in the rest, and they were ephemeral nodes. I
> don't think that it is easy for ephemeral nodes to be included in data
> files from another ensemble, since normally the session that created them
> wouldn't be there and they would expire.
> Unfortunately, I don't have the logs anymore of when this happened. They
> were running with DEBUG and they rotated.
> Any ideas?
>
>
> On Fri, Sep 27, 2013 at 2:00 AM, Alexander Shraer <sh...@gmail.com>wrote:
>
>> Some time in the past we were discussing adding a unique identifier
>> for each ensemble in the config files and checking it. For example
>> when a server tries to connect to the leader. I'm not sure if the is a
>> Jira
>> for this.
>>
>>
>> On Wed, Sep 25, 2013 at 7:44 AM, German Blanco <
>> german.blanco.blanco@gmail.com> wrote:
>>
>> > Exactly.
>> > I know it is silly, but I think this is what happened, and I would feel
>> > better if there was a way to avoid it to happen again.
>> >
>> >
>> > On Wed, Sep 25, 2013 at 4:37 AM, Benjamin Reed <br...@apache.org>
>> wrote:
>> >
>> > > when you say inconsistent transaction log, are you talking about a
>> > > transaction log from a different ensemble instance?
>> > >
>> > > for example, you ran zookeeper and did some things. then you reset the
>> > all
>> > > the servers but one and restarted everything.
>> > >
>> > > ben
>> > >
>> > >
>> > > On Tue, Sep 24, 2013 at 5:45 AM, German Blanco <
>> > > german.blanco.blanco@gmail.com> wrote:
>> > >
>> > > > Hello,
>> > > > I have run into this situation a couple of times.
>> > > > Because of an error, one ZooKeeper server in the ensemble is started
>> > > > with an inconsistent transaction log. This leads to serious and
>> > difficult
>> > > > to trace problems, until you notice that clients connected to one of
>> > the
>> > > > servers see a different data tree than the others.
>> > > > I would really like to avoid this, and it happens that the amount of
>> > data
>> > > > in my data tree is not that much (around 40 kBytes). So I would
>> like to
>> > > > propose a new option to force synchronization via snapshot in the
>> > > ZooKeeper
>> > > > Leader.
>> > > > Any opinions?
>> > > > Any other options?
>> > > > Regards,
>> > > >
>> > > > Germán Blanco.
>> > > >
>> > >
>> >
>>
>
>