You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by German Blanco <ge...@gmail.com> on 2013/09/24 14:45:59 UTC

avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Hello,
I have run into this situation a couple of times.
Because of an error, one ZooKeeper server in the ensemble is started
with an inconsistent transaction log. This leads to serious and difficult
to trace problems, until you notice that clients connected to one of the
servers see a different data tree than the others.
I would really like to avoid this, and it happens that the amount of data
in my data tree is not that much (around 40 kBytes). So I would like to
propose a new option to force synchronization via snapshot in the ZooKeeper
Leader.
Any opinions?
Any other options?
Regards,

Germán Blanco.

Re: avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Posted by German Blanco <ge...@gmail.com>.
Hello again,
unfortunately, it doesn't seem to be a silly mistake.
I have dumped the snapshots using SnapshotFormatter and everything in the
three snapshots matches, except for the missining info in one of the three
servers.
I have reopened ZOOKEEPER-1449 because of this. I hope it is possible to
figure out what happened.
Thanks for your help,
Germán.


On Sat, Sep 28, 2013 at 8:33 AM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Thank you, I will check, I must admit I haven't search for this enough :-(
> I have also noticed that there is a development option to force
> synchronisation via snapshot for 3.5.0. That should avoid the problem.
> There is anyway something strange that I have noticed. There were two
> nodes showing up in one of the followers of the ensemble (3.4.5 with 3
> nodes) that were not there in the rest, and they were ephemeral nodes. I
> don't think that it is easy for ephemeral nodes to be included in data
> files from another ensemble, since normally the session that created them
> wouldn't be there and they would expire.
> Unfortunately, I don't have the logs anymore of when this happened. They
> were running with DEBUG and they rotated.
> Any ideas?
>
>
> On Fri, Sep 27, 2013 at 2:00 AM, Alexander Shraer <sh...@gmail.com>wrote:
>
>> Some time in the past we were discussing adding a unique identifier
>> for each ensemble in the config files and checking it. For example
>> when a server tries to connect to the leader. I'm not sure if the is a
>> Jira
>> for this.
>>
>>
>> On Wed, Sep 25, 2013 at 7:44 AM, German Blanco <
>> german.blanco.blanco@gmail.com> wrote:
>>
>> > Exactly.
>> > I know it is silly, but I think this is what happened, and I would feel
>> > better if there was a way to avoid it to happen again.
>> >
>> >
>> > On Wed, Sep 25, 2013 at 4:37 AM, Benjamin Reed <br...@apache.org>
>> wrote:
>> >
>> > > when you say inconsistent transaction log, are you talking about a
>> > > transaction log from a different ensemble instance?
>> > >
>> > > for example, you ran zookeeper and did some things. then you reset the
>> > all
>> > > the servers but one and restarted everything.
>> > >
>> > > ben
>> > >
>> > >
>> > > On Tue, Sep 24, 2013 at 5:45 AM, German Blanco <
>> > > german.blanco.blanco@gmail.com> wrote:
>> > >
>> > > > Hello,
>> > > > I have run into this situation a couple of times.
>> > > > Because of an error, one ZooKeeper server in the ensemble is started
>> > > > with an inconsistent transaction log. This leads to serious and
>> > difficult
>> > > > to trace problems, until you notice that clients connected to one of
>> > the
>> > > > servers see a different data tree than the others.
>> > > > I would really like to avoid this, and it happens that the amount of
>> > data
>> > > > in my data tree is not that much (around 40 kBytes). So I would
>> like to
>> > > > propose a new option to force synchronization via snapshot in the
>> > > ZooKeeper
>> > > > Leader.
>> > > > Any opinions?
>> > > > Any other options?
>> > > > Regards,
>> > > >
>> > > > Germán Blanco.
>> > > >
>> > >
>> >
>>
>
>

Re: avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Posted by German Blanco <ge...@gmail.com>.
Thank you, I will check, I must admit I haven't search for this enough :-(
I have also noticed that there is a development option to force
synchronisation via snapshot for 3.5.0. That should avoid the problem.
There is anyway something strange that I have noticed. There were two nodes
showing up in one of the followers of the ensemble (3.4.5 with 3 nodes)
that were not there in the rest, and they were ephemeral nodes. I don't
think that it is easy for ephemeral nodes to be included in data files from
another ensemble, since normally the session that created them wouldn't be
there and they would expire.
Unfortunately, I don't have the logs anymore of when this happened. They
were running with DEBUG and they rotated.
Any ideas?


On Fri, Sep 27, 2013 at 2:00 AM, Alexander Shraer <sh...@gmail.com> wrote:

> Some time in the past we were discussing adding a unique identifier
> for each ensemble in the config files and checking it. For example
> when a server tries to connect to the leader. I'm not sure if the is a Jira
> for this.
>
>
> On Wed, Sep 25, 2013 at 7:44 AM, German Blanco <
> german.blanco.blanco@gmail.com> wrote:
>
> > Exactly.
> > I know it is silly, but I think this is what happened, and I would feel
> > better if there was a way to avoid it to happen again.
> >
> >
> > On Wed, Sep 25, 2013 at 4:37 AM, Benjamin Reed <br...@apache.org> wrote:
> >
> > > when you say inconsistent transaction log, are you talking about a
> > > transaction log from a different ensemble instance?
> > >
> > > for example, you ran zookeeper and did some things. then you reset the
> > all
> > > the servers but one and restarted everything.
> > >
> > > ben
> > >
> > >
> > > On Tue, Sep 24, 2013 at 5:45 AM, German Blanco <
> > > german.blanco.blanco@gmail.com> wrote:
> > >
> > > > Hello,
> > > > I have run into this situation a couple of times.
> > > > Because of an error, one ZooKeeper server in the ensemble is started
> > > > with an inconsistent transaction log. This leads to serious and
> > difficult
> > > > to trace problems, until you notice that clients connected to one of
> > the
> > > > servers see a different data tree than the others.
> > > > I would really like to avoid this, and it happens that the amount of
> > data
> > > > in my data tree is not that much (around 40 kBytes). So I would like
> to
> > > > propose a new option to force synchronization via snapshot in the
> > > ZooKeeper
> > > > Leader.
> > > > Any opinions?
> > > > Any other options?
> > > > Regards,
> > > >
> > > > Germán Blanco.
> > > >
> > >
> >
>

Re: avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Posted by Alexander Shraer <sh...@gmail.com>.
Some time in the past we were discussing adding a unique identifier
for each ensemble in the config files and checking it. For example
when a server tries to connect to the leader. I'm not sure if the is a Jira
for this.


On Wed, Sep 25, 2013 at 7:44 AM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Exactly.
> I know it is silly, but I think this is what happened, and I would feel
> better if there was a way to avoid it to happen again.
>
>
> On Wed, Sep 25, 2013 at 4:37 AM, Benjamin Reed <br...@apache.org> wrote:
>
> > when you say inconsistent transaction log, are you talking about a
> > transaction log from a different ensemble instance?
> >
> > for example, you ran zookeeper and did some things. then you reset the
> all
> > the servers but one and restarted everything.
> >
> > ben
> >
> >
> > On Tue, Sep 24, 2013 at 5:45 AM, German Blanco <
> > german.blanco.blanco@gmail.com> wrote:
> >
> > > Hello,
> > > I have run into this situation a couple of times.
> > > Because of an error, one ZooKeeper server in the ensemble is started
> > > with an inconsistent transaction log. This leads to serious and
> difficult
> > > to trace problems, until you notice that clients connected to one of
> the
> > > servers see a different data tree than the others.
> > > I would really like to avoid this, and it happens that the amount of
> data
> > > in my data tree is not that much (around 40 kBytes). So I would like to
> > > propose a new option to force synchronization via snapshot in the
> > ZooKeeper
> > > Leader.
> > > Any opinions?
> > > Any other options?
> > > Regards,
> > >
> > > Germán Blanco.
> > >
> >
>

Re: avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Posted by German Blanco <ge...@gmail.com>.
Exactly.
I know it is silly, but I think this is what happened, and I would feel
better if there was a way to avoid it to happen again.


On Wed, Sep 25, 2013 at 4:37 AM, Benjamin Reed <br...@apache.org> wrote:

> when you say inconsistent transaction log, are you talking about a
> transaction log from a different ensemble instance?
>
> for example, you ran zookeeper and did some things. then you reset the all
> the servers but one and restarted everything.
>
> ben
>
>
> On Tue, Sep 24, 2013 at 5:45 AM, German Blanco <
> german.blanco.blanco@gmail.com> wrote:
>
> > Hello,
> > I have run into this situation a couple of times.
> > Because of an error, one ZooKeeper server in the ensemble is started
> > with an inconsistent transaction log. This leads to serious and difficult
> > to trace problems, until you notice that clients connected to one of the
> > servers see a different data tree than the others.
> > I would really like to avoid this, and it happens that the amount of data
> > in my data tree is not that much (around 40 kBytes). So I would like to
> > propose a new option to force synchronization via snapshot in the
> ZooKeeper
> > Leader.
> > Any opinions?
> > Any other options?
> > Regards,
> >
> > Germán Blanco.
> >
>

Re: avoiding the risk of starting ZooKeeper servers in the same ensemble with different transaction logs

Posted by Benjamin Reed <br...@apache.org>.
when you say inconsistent transaction log, are you talking about a
transaction log from a different ensemble instance?

for example, you ran zookeeper and did some things. then you reset the all
the servers but one and restarted everything.

ben


On Tue, Sep 24, 2013 at 5:45 AM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Hello,
> I have run into this situation a couple of times.
> Because of an error, one ZooKeeper server in the ensemble is started
> with an inconsistent transaction log. This leads to serious and difficult
> to trace problems, until you notice that clients connected to one of the
> servers see a different data tree than the others.
> I would really like to avoid this, and it happens that the amount of data
> in my data tree is not that much (around 40 kBytes). So I would like to
> propose a new option to force synchronization via snapshot in the ZooKeeper
> Leader.
> Any opinions?
> Any other options?
> Regards,
>
> Germán Blanco.
>