You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Sergey Maslyakov <ev...@gmail.com> on 2013/07/19 20:19:17 UTC

Re: Zookeeper ensemble backup questions?

Jack,

Here is how I see the backup process happening.

1. Zookeeper server can be changed to support a new 4lw that will write out
the current state of the DataTree into a snapshot file with the path and
name provided as an argument to this new command (barring all the
permissions, disk space, and other system-level restrictions). Probably, I
would ask Zookeeper to save the snapshot in a directory outside of the
standard "dataLog" for the sake of cleanliness.

2. When Zookeeper server responds to the new "snapshot" command with
success indication, the requesting process knows that the file has been
written out and it can go and process it. It can add some metadata and
create an archive to store it somewhere, for example. Alternatively,
Zookeeper server could stream the data it would have written into a
snapshot as the response to the new "snapshot" command. This way, the
client becomes responsible for persistence and this lifts a number of
permission-related issues (but raises some other issues too). Oh, and by
the way, it looks like snapshot files are rather compressible. I did see
the factor of 20 and more on the data that I have.

3. Disk cleanups are performed.

With this backup procedure the restore would turn into:

1. Stopping all ensemble mebers

2. Wiping out dataDir/version-2 and dataLogDir/version-2

3. Restoring the snapshot taken by the above backup procedure on one of the
servers into dataDir/version-2

4. Bringing this server online

5. Allowing some time for it to load the snapshot. You could send "isro"
4lw command to it to see when it stops responding with "null". When the
response becomes "ro" or "rw", this is when it is ready to populate others
with its own data

6. Bring up other servers one-by-one, to allow them form a quorum with the
populated server

Hope, this helps! I'd be glad to hear from people who know the internals of
Zookeeper server better whether this approach is flawed or robust.

Regards,
/Sergey

On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com> wrote:

> I asked those question in the thread
>
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
> ,
> but there is no response for that.
>
> So I posted those questions again here, hopefully I could get helps
> from the community.
>
> I want to make sure I am fully understanding the procedures of zookeeper
> backup and disaster recovery:
>
> For the backup procedures at zookeeper assemble:
> (1) Login to any host which state is "Serving"
>            Question:
>                   Do I have to login to leader node, or any node is ok?
> (2) Copy latest snapshot file and transaction log from version-2 directory.
>            Question:
>                   How to make sure we do not copy corrupt files if the
> snapshot/transaction log is in the middle of update? Do we have to shutdown
> the node to make the copy?
>                   besides the transaction log and snapshot, do we have to
> copy other files such as the ecoch files
>
> For the disaster recovery procedures at zookeeper assemble:
> (1) recreate the machines for the zookeeper ensemble
> (2) copy snapshot/transaction log we backed up into the zookeeper
> dataDir\version-2 and logDir\version2.
>            Question:
>                  Do we have to copy the epoch files?
>                  Do we have to copy snapshot/transaction log backed up to
> all the zookeeper node, or just the first node we starts?
>
> Appreciate your time and help.
> Jack
>

Re: Zookeeper ensemble backup questions?

Posted by Thawan Kooburat <th...@fb.com>.

You don't need to stop ZooKeeper or wait until the system become idle.
You can do a live backup.

When you perform a backup, and restore from that backup.  This will give
you the same result as if ZooKeeper has crashed and restart at some point
during backup is being performed.

Again, if you need to make a backup of a very specific point in time, you
can either stop zookeeper or block all client port.   Or write some tool
to process txnlog if stopping zookeeper is not possible.


-- 
Thawan Kooburat





On 7/19/13 3:32 PM, "jack ma" <ja...@gmail.com> wrote:

>Thanks. Yes, we could just backup latest snapshot and transaction log. But
>how to prevent those files are not updated by zookeeper during our backup
>procedure. You mentioned to put the server to idle. What does that mean?
>It
>put whole zookeeper ensemble to idle or just the server we backed up from?
>What is the zookeeper command for that?
>
>Appreciate for your help.
>Jack
>
>
>On Fri, Jul 19, 2013 at 2:24 PM, Thawan Kooburat <th...@fb.com> wrote:
>
>> If you back up the entire data/snap dir at any given time on one of the
>> machine.  You can restore the system using the backup.  It will include
>> all the committed txn and some inf-light txn up to that point
>>
>> If you need to be able to optimize the size of backup then you will have
>> to understand how the server load the snapshot and txnlog. In short, you
>> don't need the whole dir but just the most recently files
>>
>> If you really need to be specific about which txns are included in the
>> backup, one easy way is to make the system idle before doing backup.
>> Otherwise, you need to write a tool to process the txnlog and snapshot
>> file.
>>
>>
>> So there is nothing prevent you from doing a zookeeper backup on the
>> current release (although it might not be convenient and require some
>> knowledge about zookeeper)
>>
>> --
>> Thawan Kooburat
>>
>>
>>
>>
>>
>> On 7/19/13 11:42 AM, "jack ma" <ja...@gmail.com> wrote:
>>
>> >Thanks Sergei.
>> >
>> >That is great improvement idea for the zookeeper. I think that
>>zookeeper
>> >is
>> >planning to add a new 4lrt command "snap", but it is not ready yet.
>> >
>> >My original questions is based on the current version of zookeeper
>> >(3.4.5),
>> >do you have any answers for them?
>> >
>> >Appreciate for the help.
>> >
>> >thanks
>> >Jack
>> >
>> >
>> >
>> >
>> >On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov
>> ><ev...@gmail.com>wrote:
>> >
>> >> Jack,
>> >>
>> >> Here is how I see the backup process happening.
>> >>
>> >> 1. Zookeeper server can be changed to support a new 4lw that will
>>write
>> >>out
>> >> the current state of the DataTree into a snapshot file with the path
>>and
>> >> name provided as an argument to this new command (barring all the
>> >> permissions, disk space, and other system-level restrictions).
>> >>Probably, I
>> >> would ask Zookeeper to save the snapshot in a directory outside of
>>the
>> >> standard "dataLog" for the sake of cleanliness.
>> >>
>> >> 2. When Zookeeper server responds to the new "snapshot" command with
>> >> success indication, the requesting process knows that the file has
>>been
>> >> written out and it can go and process it. It can add some metadata
>>and
>> >> create an archive to store it somewhere, for example. Alternatively,
>> >> Zookeeper server could stream the data it would have written into a
>> >> snapshot as the response to the new "snapshot" command. This way, the
>> >> client becomes responsible for persistence and this lifts a number of
>> >> permission-related issues (but raises some other issues too). Oh,
>>and by
>> >> the way, it looks like snapshot files are rather compressible. I did
>>see
>> >> the factor of 20 and more on the data that I have.
>> >>
>> >> 3. Disk cleanups are performed.
>> >>
>> >> With this backup procedure the restore would turn into:
>> >>
>> >> 1. Stopping all ensemble mebers
>> >>
>> >> 2. Wiping out dataDir/version-2 and dataLogDir/version-2
>> >>
>> >> 3. Restoring the snapshot taken by the above backup procedure on one
>>of
>> >>the
>> >> servers into dataDir/version-2
>> >>
>> >> 4. Bringing this server online
>> >>
>> >> 5. Allowing some time for it to load the snapshot. You could send
>>"isro"
>> >> 4lw command to it to see when it stops responding with "null". When
>>the
>> >> response becomes "ro" or "rw", this is when it is ready to populate
>> >>others
>> >> with its own data
>> >>
>> >> 6. Bring up other servers one-by-one, to allow them form a quorum
>>with
>> >>the
>> >> populated server
>> >>
>> >>
>> >> Hope, this helps! I'd be glad to hear from people who know the
>> >>internals of
>> >> Zookeeper server better whether this approach is flawed or robust.
>> >>
>> >>
>> >> Regards,
>> >> /Sergey
>> >>
>> >>
>> >> On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com>
>>wrote:
>> >>
>> >> > I asked those question in the thread
>> >> >
>> >> >
>> >>
>> >>
>> 
>>http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCA
>> >>B+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
>> >> > ,
>> >> > but there is no response for that.
>> >> >
>> >> > So I posted those questions again here, hopefully I could get helps
>> >> > from the community.
>> >> >
>> >> > I want to make sure I am fully understanding the procedures of
>> >>zookeeper
>> >> > backup and disaster recovery:
>> >> >
>> >> > For the backup procedures at zookeeper assemble:
>> >> > (1) Login to any host which state is "Serving"
>> >> >            Question:
>> >> >                   Do I have to login to leader node, or any node is
>> >>ok?
>> >> > (2) Copy latest snapshot file and transaction log from version-2
>> >> directory.
>> >> >            Question:
>> >> >                   How to make sure we do not copy corrupt files if
>>the
>> >> > snapshot/transaction log is in the middle of update? Do we have to
>> >> shutdown
>> >> > the node to make the copy?
>> >> >                   besides the transaction log and snapshot, do we
>> >>have to
>> >> > copy other files such as the ecoch files
>> >> >
>> >> > For the disaster recovery procedures at zookeeper assemble:
>> >> > (1) recreate the machines for the zookeeper ensemble
>> >> > (2) copy snapshot/transaction log we backed up into the zookeeper
>> >> > dataDir\version-2 and logDir\version2.
>> >> >            Question:
>> >> >                  Do we have to copy the epoch files?
>> >> >                  Do we have to copy snapshot/transaction log backed
>> >>up to
>> >> > all the zookeeper node, or just the first node we starts?
>> >> >
>> >> > Appreciate your time and help.
>> >> > Jack
>> >> >
>> >>
>>
>>

Re: Zookeeper ensemble backup questions?

Posted by jack ma <ja...@gmail.com>.

Thanks. Yes, we could just backup latest snapshot and transaction log. But
how to prevent those files are not updated by zookeeper during our backup
procedure. You mentioned to put the server to idle. What does that mean? It
put whole zookeeper ensemble to idle or just the server we backed up from?
What is the zookeeper command for that?

Appreciate for your help.
Jack


On Fri, Jul 19, 2013 at 2:24 PM, Thawan Kooburat <th...@fb.com> wrote:

> If you back up the entire data/snap dir at any given time on one of the
> machine.  You can restore the system using the backup.  It will include
> all the committed txn and some inf-light txn up to that point
>
> If you need to be able to optimize the size of backup then you will have
> to understand how the server load the snapshot and txnlog. In short, you
> don't need the whole dir but just the most recently files
>
> If you really need to be specific about which txns are included in the
> backup, one easy way is to make the system idle before doing backup.
> Otherwise, you need to write a tool to process the txnlog and snapshot
> file.
>
>
> So there is nothing prevent you from doing a zookeeper backup on the
> current release (although it might not be convenient and require some
> knowledge about zookeeper)
>
> --
> Thawan Kooburat
>
>
>
>
>
> On 7/19/13 11:42 AM, "jack ma" <ja...@gmail.com> wrote:
>
> >Thanks Sergei.
> >
> >That is great improvement idea for the zookeeper. I think that zookeeper
> >is
> >planning to add a new 4lrt command "snap", but it is not ready yet.
> >
> >My original questions is based on the current version of zookeeper
> >(3.4.5),
> >do you have any answers for them?
> >
> >Appreciate for the help.
> >
> >thanks
> >Jack
> >
> >
> >
> >
> >On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov
> ><ev...@gmail.com>wrote:
> >
> >> Jack,
> >>
> >> Here is how I see the backup process happening.
> >>
> >> 1. Zookeeper server can be changed to support a new 4lw that will write
> >>out
> >> the current state of the DataTree into a snapshot file with the path and
> >> name provided as an argument to this new command (barring all the
> >> permissions, disk space, and other system-level restrictions).
> >>Probably, I
> >> would ask Zookeeper to save the snapshot in a directory outside of the
> >> standard "dataLog" for the sake of cleanliness.
> >>
> >> 2. When Zookeeper server responds to the new "snapshot" command with
> >> success indication, the requesting process knows that the file has been
> >> written out and it can go and process it. It can add some metadata and
> >> create an archive to store it somewhere, for example. Alternatively,
> >> Zookeeper server could stream the data it would have written into a
> >> snapshot as the response to the new "snapshot" command. This way, the
> >> client becomes responsible for persistence and this lifts a number of
> >> permission-related issues (but raises some other issues too). Oh, and by
> >> the way, it looks like snapshot files are rather compressible. I did see
> >> the factor of 20 and more on the data that I have.
> >>
> >> 3. Disk cleanups are performed.
> >>
> >> With this backup procedure the restore would turn into:
> >>
> >> 1. Stopping all ensemble mebers
> >>
> >> 2. Wiping out dataDir/version-2 and dataLogDir/version-2
> >>
> >> 3. Restoring the snapshot taken by the above backup procedure on one of
> >>the
> >> servers into dataDir/version-2
> >>
> >> 4. Bringing this server online
> >>
> >> 5. Allowing some time for it to load the snapshot. You could send "isro"
> >> 4lw command to it to see when it stops responding with "null". When the
> >> response becomes "ro" or "rw", this is when it is ready to populate
> >>others
> >> with its own data
> >>
> >> 6. Bring up other servers one-by-one, to allow them form a quorum with
> >>the
> >> populated server
> >>
> >>
> >> Hope, this helps! I'd be glad to hear from people who know the
> >>internals of
> >> Zookeeper server better whether this approach is flawed or robust.
> >>
> >>
> >> Regards,
> >> /Sergey
> >>
> >>
> >> On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com> wrote:
> >>
> >> > I asked those question in the thread
> >> >
> >> >
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCA
> >>B+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
> >> > ,
> >> > but there is no response for that.
> >> >
> >> > So I posted those questions again here, hopefully I could get helps
> >> > from the community.
> >> >
> >> > I want to make sure I am fully understanding the procedures of
> >>zookeeper
> >> > backup and disaster recovery:
> >> >
> >> > For the backup procedures at zookeeper assemble:
> >> > (1) Login to any host which state is "Serving"
> >> >            Question:
> >> >                   Do I have to login to leader node, or any node is
> >>ok?
> >> > (2) Copy latest snapshot file and transaction log from version-2
> >> directory.
> >> >            Question:
> >> >                   How to make sure we do not copy corrupt files if the
> >> > snapshot/transaction log is in the middle of update? Do we have to
> >> shutdown
> >> > the node to make the copy?
> >> >                   besides the transaction log and snapshot, do we
> >>have to
> >> > copy other files such as the ecoch files
> >> >
> >> > For the disaster recovery procedures at zookeeper assemble:
> >> > (1) recreate the machines for the zookeeper ensemble
> >> > (2) copy snapshot/transaction log we backed up into the zookeeper
> >> > dataDir\version-2 and logDir\version2.
> >> >            Question:
> >> >                  Do we have to copy the epoch files?
> >> >                  Do we have to copy snapshot/transaction log backed
> >>up to
> >> > all the zookeeper node, or just the first node we starts?
> >> >
> >> > Appreciate your time and help.
> >> > Jack
> >> >
> >>
>
>

Re: Zookeeper ensemble backup questions?

Posted by jack ma <ja...@gmail.com>.

Thanks Sergey.

That is great. Did you contribute you work back to zookeeper? When you take
a snapshot, did you have to block the zookeeper to accept for request?


On Fri, Jul 19, 2013 at 12:29 PM, Sergey Maslyakov <ev...@gmail.com>wrote:

> A word of preemptive self-defense: I am not an experienced Java developer.
> Please, don't throw rotten eggs at me if I did not follow well-known Java
> coding patterns :)
>
>
> Regards,
> /Sergey
>
>
> On Fri, Jul 19, 2013 at 2:15 PM, Sergey Maslyakov <ev...@gmail.com>
> wrote:
>
> > I can share this patch based on 3.4.5, which does thee trick.
> >
> > It adds a "snps" 4lw command that accepts one mandatory argument, which
> is
> > an absolute path for the direcotry where the snapshot file will be
> dropped.
> > The "absoluteness" of the path s verified by UNIX rules. Not sure how it
> > would work in Windows, though. The target directory must exist and be
> > writeable by the effective UID of Zookeeper server.
> >
> > If the operation was successful, Zookeeper server responds back with the
> > absolute path of the snapshot file. You can watch for the '/' character
> to
> > trigger your reaction to the response.
> >
> > In my case, a 700MB snapshot takes about 30 seconds to write out.
> >
> > Please see several examples below:
> >
> > ~ $ mkdir /tmp/snapshot-test
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps /tmp/snapshot-test
> > /tmp/snapshot-test/snapshot.316c8
> > Connection to localhost closed by foreign host.
> >
> > ~ $ ls -al /tmp/snapshot-test/snapshot.316c8
> > -rw-r--r--   1 srvr     srvr     719602373 Jul 19 14:09
> > /tmp/snapshot-test/snapshot.316c8
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps blah
> > Snapshot directory path must be absoulte, i.e., it must start with '/'.
> > Path "blah" does not meet the criteria.
> > Connection to localhost closed by foreign host.
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps /tmp/blah
> > Error while serializing snapshot into /tmp/blah/snapshot.316c8.
> > /tmp/blah/snapshot.316c8 (No such file or directory)
> > Connection to localhost closed by foreign host.
> >
> > ~ $ telnet localhost 12181
> > Trying 127.0.0.1...
> > Connected to localhost.
> > Escape character is '^]'.
> > snps
> > Snapshot directory path must be absoulte, i.e., it must start with '/'.
> > Path "" does not meet the criteria.
> > Connection to localhost closed by foreign host.
> >
> > ~ $
> >
> >
> >
> >
> > On Fri, Jul 19, 2013 at 1:42 PM, jack ma <ja...@gmail.com> wrote:
> >
> >> Thanks Sergei.
> >>
> >> That is great improvement idea for the zookeeper. I think that zookeeper
> >> is
> >> planning to add a new 4lrt command "snap", but it is not ready yet.
> >>
> >> My original questions is based on the current version of zookeeper
> >> (3.4.5),
> >> do you have any answers for them?
> >>
> >> Appreciate for the help.
> >>
> >> thanks
> >> Jack
> >>
> >>
> >>
> >>
> >> On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <evolvah@gmail.com
> >> >wrote:
> >>
> >> > Jack,
> >> >
> >> > Here is how I see the backup process happening.
> >> >
> >> > 1. Zookeeper server can be changed to support a new 4lw that will
> write
> >> out
> >> > the current state of the DataTree into a snapshot file with the path
> and
> >> > name provided as an argument to this new command (barring all the
> >> > permissions, disk space, and other system-level restrictions).
> >> Probably, I
> >> > would ask Zookeeper to save the snapshot in a directory outside of the
> >> > standard "dataLog" for the sake of cleanliness.
> >> >
> >> > 2. When Zookeeper server responds to the new "snapshot" command with
> >> > success indication, the requesting process knows that the file has
> been
> >> > written out and it can go and process it. It can add some metadata and
> >> > create an archive to store it somewhere, for example. Alternatively,
> >> > Zookeeper server could stream the data it would have written into a
> >> > snapshot as the response to the new "snapshot" command. This way, the
> >> > client becomes responsible for persistence and this lifts a number of
> >> > permission-related issues (but raises some other issues too). Oh, and
> by
> >> > the way, it looks like snapshot files are rather compressible. I did
> see
> >> > the factor of 20 and more on the data that I have.
> >> >
> >> > 3. Disk cleanups are performed.
> >> >
> >> > With this backup procedure the restore would turn into:
> >> >
> >> > 1. Stopping all ensemble mebers
> >> >
> >> > 2. Wiping out dataDir/version-2 and dataLogDir/version-2
> >> >
> >> > 3. Restoring the snapshot taken by the above backup procedure on one
> of
> >> the
> >> > servers into dataDir/version-2
> >> >
> >> > 4. Bringing this server online
> >> >
> >> > 5. Allowing some time for it to load the snapshot. You could send
> "isro"
> >> > 4lw command to it to see when it stops responding with "null". When
> the
> >> > response becomes "ro" or "rw", this is when it is ready to populate
> >> others
> >> > with its own data
> >> >
> >> > 6. Bring up other servers one-by-one, to allow them form a quorum with
> >> the
> >> > populated server
> >> >
> >> >
> >> > Hope, this helps! I'd be glad to hear from people who know the
> >> internals of
> >> > Zookeeper server better whether this approach is flawed or robust.
> >> >
> >> >
> >> > Regards,
> >> > /Sergey
> >> >
> >> >
> >> > On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com>
> wrote:
> >> >
> >> > > I asked those question in the thread
> >> > >
> >> > >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
> >> > > ,
> >> > > but there is no response for that.
> >> > >
> >> > > So I posted those questions again here, hopefully I could get helps
> >> > > from the community.
> >> > >
> >> > > I want to make sure I am fully understanding the procedures of
> >> zookeeper
> >> > > backup and disaster recovery:
> >> > >
> >> > > For the backup procedures at zookeeper assemble:
> >> > > (1) Login to any host which state is "Serving"
> >> > >            Question:
> >> > >                   Do I have to login to leader node, or any node is
> >> ok?
> >> > > (2) Copy latest snapshot file and transaction log from version-2
> >> > directory.
> >> > >            Question:
> >> > >                   How to make sure we do not copy corrupt files if
> the
> >> > > snapshot/transaction log is in the middle of update? Do we have to
> >> > shutdown
> >> > > the node to make the copy?
> >> > >                   besides the transaction log and snapshot, do we
> >> have to
> >> > > copy other files such as the ecoch files
> >> > >
> >> > > For the disaster recovery procedures at zookeeper assemble:
> >> > > (1) recreate the machines for the zookeeper ensemble
> >> > > (2) copy snapshot/transaction log we backed up into the zookeeper
> >> > > dataDir\version-2 and logDir\version2.
> >> > >            Question:
> >> > >                  Do we have to copy the epoch files?
> >> > >                  Do we have to copy snapshot/transaction log backed
> >> up to
> >> > > all the zookeeper node, or just the first node we starts?
> >> > >
> >> > > Appreciate your time and help.
> >> > > Jack
> >> > >
> >> >
> >>
> >
> >
>

Re: Zookeeper ensemble backup questions?

Posted by Sergey Maslyakov <ev...@gmail.com>.

A word of preemptive self-defense: I am not an experienced Java developer.
Please, don't throw rotten eggs at me if I did not follow well-known Java
coding patterns :)


Regards,
/Sergey


On Fri, Jul 19, 2013 at 2:15 PM, Sergey Maslyakov <ev...@gmail.com> wrote:

> I can share this patch based on 3.4.5, which does thee trick.
>
> It adds a "snps" 4lw command that accepts one mandatory argument, which is
> an absolute path for the direcotry where the snapshot file will be dropped.
> The "absoluteness" of the path s verified by UNIX rules. Not sure how it
> would work in Windows, though. The target directory must exist and be
> writeable by the effective UID of Zookeeper server.
>
> If the operation was successful, Zookeeper server responds back with the
> absolute path of the snapshot file. You can watch for the '/' character to
> trigger your reaction to the response.
>
> In my case, a 700MB snapshot takes about 30 seconds to write out.
>
> Please see several examples below:
>
> ~ $ mkdir /tmp/snapshot-test
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps /tmp/snapshot-test
> /tmp/snapshot-test/snapshot.316c8
> Connection to localhost closed by foreign host.
>
> ~ $ ls -al /tmp/snapshot-test/snapshot.316c8
> -rw-r--r--   1 srvr     srvr     719602373 Jul 19 14:09
> /tmp/snapshot-test/snapshot.316c8
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps blah
> Snapshot directory path must be absoulte, i.e., it must start with '/'.
> Path "blah" does not meet the criteria.
> Connection to localhost closed by foreign host.
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps /tmp/blah
> Error while serializing snapshot into /tmp/blah/snapshot.316c8.
> /tmp/blah/snapshot.316c8 (No such file or directory)
> Connection to localhost closed by foreign host.
>
> ~ $ telnet localhost 12181
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> snps
> Snapshot directory path must be absoulte, i.e., it must start with '/'.
> Path "" does not meet the criteria.
> Connection to localhost closed by foreign host.
>
> ~ $
>
>
>
>
> On Fri, Jul 19, 2013 at 1:42 PM, jack ma <ja...@gmail.com> wrote:
>
>> Thanks Sergei.
>>
>> That is great improvement idea for the zookeeper. I think that zookeeper
>> is
>> planning to add a new 4lrt command "snap", but it is not ready yet.
>>
>> My original questions is based on the current version of zookeeper
>> (3.4.5),
>> do you have any answers for them?
>>
>> Appreciate for the help.
>>
>> thanks
>> Jack
>>
>>
>>
>>
>> On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <evolvah@gmail.com
>> >wrote:
>>
>> > Jack,
>> >
>> > Here is how I see the backup process happening.
>> >
>> > 1. Zookeeper server can be changed to support a new 4lw that will write
>> out
>> > the current state of the DataTree into a snapshot file with the path and
>> > name provided as an argument to this new command (barring all the
>> > permissions, disk space, and other system-level restrictions).
>> Probably, I
>> > would ask Zookeeper to save the snapshot in a directory outside of the
>> > standard "dataLog" for the sake of cleanliness.
>> >
>> > 2. When Zookeeper server responds to the new "snapshot" command with
>> > success indication, the requesting process knows that the file has been
>> > written out and it can go and process it. It can add some metadata and
>> > create an archive to store it somewhere, for example. Alternatively,
>> > Zookeeper server could stream the data it would have written into a
>> > snapshot as the response to the new "snapshot" command. This way, the
>> > client becomes responsible for persistence and this lifts a number of
>> > permission-related issues (but raises some other issues too). Oh, and by
>> > the way, it looks like snapshot files are rather compressible. I did see
>> > the factor of 20 and more on the data that I have.
>> >
>> > 3. Disk cleanups are performed.
>> >
>> > With this backup procedure the restore would turn into:
>> >
>> > 1. Stopping all ensemble mebers
>> >
>> > 2. Wiping out dataDir/version-2 and dataLogDir/version-2
>> >
>> > 3. Restoring the snapshot taken by the above backup procedure on one of
>> the
>> > servers into dataDir/version-2
>> >
>> > 4. Bringing this server online
>> >
>> > 5. Allowing some time for it to load the snapshot. You could send "isro"
>> > 4lw command to it to see when it stops responding with "null". When the
>> > response becomes "ro" or "rw", this is when it is ready to populate
>> others
>> > with its own data
>> >
>> > 6. Bring up other servers one-by-one, to allow them form a quorum with
>> the
>> > populated server
>> >
>> >
>> > Hope, this helps! I'd be glad to hear from people who know the
>> internals of
>> > Zookeeper server better whether this approach is flawed or robust.
>> >
>> >
>> > Regards,
>> > /Sergey
>> >
>> >
>> > On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com> wrote:
>> >
>> > > I asked those question in the thread
>> > >
>> > >
>> >
>> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
>> > > ,
>> > > but there is no response for that.
>> > >
>> > > So I posted those questions again here, hopefully I could get helps
>> > > from the community.
>> > >
>> > > I want to make sure I am fully understanding the procedures of
>> zookeeper
>> > > backup and disaster recovery:
>> > >
>> > > For the backup procedures at zookeeper assemble:
>> > > (1) Login to any host which state is "Serving"
>> > >            Question:
>> > >                   Do I have to login to leader node, or any node is
>> ok?
>> > > (2) Copy latest snapshot file and transaction log from version-2
>> > directory.
>> > >            Question:
>> > >                   How to make sure we do not copy corrupt files if the
>> > > snapshot/transaction log is in the middle of update? Do we have to
>> > shutdown
>> > > the node to make the copy?
>> > >                   besides the transaction log and snapshot, do we
>> have to
>> > > copy other files such as the ecoch files
>> > >
>> > > For the disaster recovery procedures at zookeeper assemble:
>> > > (1) recreate the machines for the zookeeper ensemble
>> > > (2) copy snapshot/transaction log we backed up into the zookeeper
>> > > dataDir\version-2 and logDir\version2.
>> > >            Question:
>> > >                  Do we have to copy the epoch files?
>> > >                  Do we have to copy snapshot/transaction log backed
>> up to
>> > > all the zookeeper node, or just the first node we starts?
>> > >
>> > > Appreciate your time and help.
>> > > Jack
>> > >
>> >
>>
>
>

Re: Zookeeper ensemble backup questions?

Posted by Sergey Maslyakov <ev...@gmail.com>.

I can share this patch based on 3.4.5, which does thee trick.

It adds a "snps" 4lw command that accepts one mandatory argument, which is
an absolute path for the direcotry where the snapshot file will be dropped.
The "absoluteness" of the path s verified by UNIX rules. Not sure how it
would work in Windows, though. The target directory must exist and be
writeable by the effective UID of Zookeeper server.

If the operation was successful, Zookeeper server responds back with the
absolute path of the snapshot file. You can watch for the '/' character to
trigger your reaction to the response.

In my case, a 700MB snapshot takes about 30 seconds to write out.

Please see several examples below:

~ $ mkdir /tmp/snapshot-test

~ $ telnet localhost 12181
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
snps /tmp/snapshot-test
/tmp/snapshot-test/snapshot.316c8
Connection to localhost closed by foreign host.

~ $ ls -al /tmp/snapshot-test/snapshot.316c8
-rw-r--r--   1 srvr     srvr     719602373 Jul 19 14:09
/tmp/snapshot-test/snapshot.316c8

~ $ telnet localhost 12181
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
snps blah
Snapshot directory path must be absoulte, i.e., it must start with '/'.
Path "blah" does not meet the criteria.
Connection to localhost closed by foreign host.

~ $ telnet localhost 12181
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
snps /tmp/blah
Error while serializing snapshot into /tmp/blah/snapshot.316c8.
/tmp/blah/snapshot.316c8 (No such file or directory)
Connection to localhost closed by foreign host.

~ $ telnet localhost 12181
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
snps
Snapshot directory path must be absoulte, i.e., it must start with '/'.
Path "" does not meet the criteria.
Connection to localhost closed by foreign host.

~ $




On Fri, Jul 19, 2013 at 1:42 PM, jack ma <ja...@gmail.com> wrote:

> Thanks Sergei.
>
> That is great improvement idea for the zookeeper. I think that zookeeper is
> planning to add a new 4lrt command "snap", but it is not ready yet.
>
> My original questions is based on the current version of zookeeper (3.4.5),
> do you have any answers for them?
>
> Appreciate for the help.
>
> thanks
> Jack
>
>
>
>
> On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <evolvah@gmail.com
> >wrote:
>
> > Jack,
> >
> > Here is how I see the backup process happening.
> >
> > 1. Zookeeper server can be changed to support a new 4lw that will write
> out
> > the current state of the DataTree into a snapshot file with the path and
> > name provided as an argument to this new command (barring all the
> > permissions, disk space, and other system-level restrictions). Probably,
> I
> > would ask Zookeeper to save the snapshot in a directory outside of the
> > standard "dataLog" for the sake of cleanliness.
> >
> > 2. When Zookeeper server responds to the new "snapshot" command with
> > success indication, the requesting process knows that the file has been
> > written out and it can go and process it. It can add some metadata and
> > create an archive to store it somewhere, for example. Alternatively,
> > Zookeeper server could stream the data it would have written into a
> > snapshot as the response to the new "snapshot" command. This way, the
> > client becomes responsible for persistence and this lifts a number of
> > permission-related issues (but raises some other issues too). Oh, and by
> > the way, it looks like snapshot files are rather compressible. I did see
> > the factor of 20 and more on the data that I have.
> >
> > 3. Disk cleanups are performed.
> >
> > With this backup procedure the restore would turn into:
> >
> > 1. Stopping all ensemble mebers
> >
> > 2. Wiping out dataDir/version-2 and dataLogDir/version-2
> >
> > 3. Restoring the snapshot taken by the above backup procedure on one of
> the
> > servers into dataDir/version-2
> >
> > 4. Bringing this server online
> >
> > 5. Allowing some time for it to load the snapshot. You could send "isro"
> > 4lw command to it to see when it stops responding with "null". When the
> > response becomes "ro" or "rw", this is when it is ready to populate
> others
> > with its own data
> >
> > 6. Bring up other servers one-by-one, to allow them form a quorum with
> the
> > populated server
> >
> >
> > Hope, this helps! I'd be glad to hear from people who know the internals
> of
> > Zookeeper server better whether this approach is flawed or robust.
> >
> >
> > Regards,
> > /Sergey
> >
> >
> > On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com> wrote:
> >
> > > I asked those question in the thread
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
> > > ,
> > > but there is no response for that.
> > >
> > > So I posted those questions again here, hopefully I could get helps
> > > from the community.
> > >
> > > I want to make sure I am fully understanding the procedures of
> zookeeper
> > > backup and disaster recovery:
> > >
> > > For the backup procedures at zookeeper assemble:
> > > (1) Login to any host which state is "Serving"
> > >            Question:
> > >                   Do I have to login to leader node, or any node is ok?
> > > (2) Copy latest snapshot file and transaction log from version-2
> > directory.
> > >            Question:
> > >                   How to make sure we do not copy corrupt files if the
> > > snapshot/transaction log is in the middle of update? Do we have to
> > shutdown
> > > the node to make the copy?
> > >                   besides the transaction log and snapshot, do we have
> to
> > > copy other files such as the ecoch files
> > >
> > > For the disaster recovery procedures at zookeeper assemble:
> > > (1) recreate the machines for the zookeeper ensemble
> > > (2) copy snapshot/transaction log we backed up into the zookeeper
> > > dataDir\version-2 and logDir\version2.
> > >            Question:
> > >                  Do we have to copy the epoch files?
> > >                  Do we have to copy snapshot/transaction log backed up
> to
> > > all the zookeeper node, or just the first node we starts?
> > >
> > > Appreciate your time and help.
> > > Jack
> > >
> >
>

Re: Zookeeper ensemble backup questions?

Posted by Thawan Kooburat <th...@fb.com>.

If you back up the entire data/snap dir at any given time on one of the
machine.  You can restore the system using the backup.  It will include
all the committed txn and some inf-light txn up to that point

If you need to be able to optimize the size of backup then you will have
to understand how the server load the snapshot and txnlog. In short, you
don't need the whole dir but just the most recently files

If you really need to be specific about which txns are included in the
backup, one easy way is to make the system idle before doing backup.
Otherwise, you need to write a tool to process the txnlog and snapshot
file.


So there is nothing prevent you from doing a zookeeper backup on the
current release (although it might not be convenient and require some
knowledge about zookeeper)

-- 
Thawan Kooburat





On 7/19/13 11:42 AM, "jack ma" <ja...@gmail.com> wrote:

>Thanks Sergei.
>
>That is great improvement idea for the zookeeper. I think that zookeeper
>is
>planning to add a new 4lrt command "snap", but it is not ready yet.
>
>My original questions is based on the current version of zookeeper
>(3.4.5),
>do you have any answers for them?
>
>Appreciate for the help.
>
>thanks
>Jack
>
>
>
>
>On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov
><ev...@gmail.com>wrote:
>
>> Jack,
>>
>> Here is how I see the backup process happening.
>>
>> 1. Zookeeper server can be changed to support a new 4lw that will write
>>out
>> the current state of the DataTree into a snapshot file with the path and
>> name provided as an argument to this new command (barring all the
>> permissions, disk space, and other system-level restrictions).
>>Probably, I
>> would ask Zookeeper to save the snapshot in a directory outside of the
>> standard "dataLog" for the sake of cleanliness.
>>
>> 2. When Zookeeper server responds to the new "snapshot" command with
>> success indication, the requesting process knows that the file has been
>> written out and it can go and process it. It can add some metadata and
>> create an archive to store it somewhere, for example. Alternatively,
>> Zookeeper server could stream the data it would have written into a
>> snapshot as the response to the new "snapshot" command. This way, the
>> client becomes responsible for persistence and this lifts a number of
>> permission-related issues (but raises some other issues too). Oh, and by
>> the way, it looks like snapshot files are rather compressible. I did see
>> the factor of 20 and more on the data that I have.
>>
>> 3. Disk cleanups are performed.
>>
>> With this backup procedure the restore would turn into:
>>
>> 1. Stopping all ensemble mebers
>>
>> 2. Wiping out dataDir/version-2 and dataLogDir/version-2
>>
>> 3. Restoring the snapshot taken by the above backup procedure on one of
>>the
>> servers into dataDir/version-2
>>
>> 4. Bringing this server online
>>
>> 5. Allowing some time for it to load the snapshot. You could send "isro"
>> 4lw command to it to see when it stops responding with "null". When the
>> response becomes "ro" or "rw", this is when it is ready to populate
>>others
>> with its own data
>>
>> 6. Bring up other servers one-by-one, to allow them form a quorum with
>>the
>> populated server
>>
>>
>> Hope, this helps! I'd be glad to hear from people who know the
>>internals of
>> Zookeeper server better whether this approach is flawed or robust.
>>
>>
>> Regards,
>> /Sergey
>>
>>
>> On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com> wrote:
>>
>> > I asked those question in the thread
>> >
>> >
>> 
>>http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCA
>>B+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
>> > ,
>> > but there is no response for that.
>> >
>> > So I posted those questions again here, hopefully I could get helps
>> > from the community.
>> >
>> > I want to make sure I am fully understanding the procedures of
>>zookeeper
>> > backup and disaster recovery:
>> >
>> > For the backup procedures at zookeeper assemble:
>> > (1) Login to any host which state is "Serving"
>> >            Question:
>> >                   Do I have to login to leader node, or any node is
>>ok?
>> > (2) Copy latest snapshot file and transaction log from version-2
>> directory.
>> >            Question:
>> >                   How to make sure we do not copy corrupt files if the
>> > snapshot/transaction log is in the middle of update? Do we have to
>> shutdown
>> > the node to make the copy?
>> >                   besides the transaction log and snapshot, do we
>>have to
>> > copy other files such as the ecoch files
>> >
>> > For the disaster recovery procedures at zookeeper assemble:
>> > (1) recreate the machines for the zookeeper ensemble
>> > (2) copy snapshot/transaction log we backed up into the zookeeper
>> > dataDir\version-2 and logDir\version2.
>> >            Question:
>> >                  Do we have to copy the epoch files?
>> >                  Do we have to copy snapshot/transaction log backed
>>up to
>> > all the zookeeper node, or just the first node we starts?
>> >
>> > Appreciate your time and help.
>> > Jack
>> >
>>

Re: Zookeeper ensemble backup questions?

Posted by jack ma <ja...@gmail.com>.

Thanks Sergei.

That is great improvement idea for the zookeeper. I think that zookeeper is
planning to add a new 4lrt command "snap", but it is not ready yet.

My original questions is based on the current version of zookeeper (3.4.5),
do you have any answers for them?

Appreciate for the help.

thanks
Jack




On Fri, Jul 19, 2013 at 11:19 AM, Sergey Maslyakov <ev...@gmail.com>wrote:

> Jack,
>
> Here is how I see the backup process happening.
>
> 1. Zookeeper server can be changed to support a new 4lw that will write out
> the current state of the DataTree into a snapshot file with the path and
> name provided as an argument to this new command (barring all the
> permissions, disk space, and other system-level restrictions). Probably, I
> would ask Zookeeper to save the snapshot in a directory outside of the
> standard "dataLog" for the sake of cleanliness.
>
> 2. When Zookeeper server responds to the new "snapshot" command with
> success indication, the requesting process knows that the file has been
> written out and it can go and process it. It can add some metadata and
> create an archive to store it somewhere, for example. Alternatively,
> Zookeeper server could stream the data it would have written into a
> snapshot as the response to the new "snapshot" command. This way, the
> client becomes responsible for persistence and this lifts a number of
> permission-related issues (but raises some other issues too). Oh, and by
> the way, it looks like snapshot files are rather compressible. I did see
> the factor of 20 and more on the data that I have.
>
> 3. Disk cleanups are performed.
>
> With this backup procedure the restore would turn into:
>
> 1. Stopping all ensemble mebers
>
> 2. Wiping out dataDir/version-2 and dataLogDir/version-2
>
> 3. Restoring the snapshot taken by the above backup procedure on one of the
> servers into dataDir/version-2
>
> 4. Bringing this server online
>
> 5. Allowing some time for it to load the snapshot. You could send "isro"
> 4lw command to it to see when it stops responding with "null". When the
> response becomes "ro" or "rw", this is when it is ready to populate others
> with its own data
>
> 6. Bring up other servers one-by-one, to allow them form a quorum with the
> populated server
>
>
> Hope, this helps! I'd be glad to hear from people who know the internals of
> Zookeeper server better whether this approach is flawed or robust.
>
>
> Regards,
> /Sergey
>
>
> On Fri, Jul 19, 2013 at 1:00 PM, jack ma <ja...@gmail.com> wrote:
>
> > I asked those question in the thread
> >
> >
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201307.mbox/%3cCAB+cfdwhOV0JfB04=MpO_+i-4ou=VbL=EG2XS557+j+698jx3A@mail.gmail.com%3e
> > ,
> > but there is no response for that.
> >
> > So I posted those questions again here, hopefully I could get helps
> > from the community.
> >
> > I want to make sure I am fully understanding the procedures of zookeeper
> > backup and disaster recovery:
> >
> > For the backup procedures at zookeeper assemble:
> > (1) Login to any host which state is "Serving"
> >            Question:
> >                   Do I have to login to leader node, or any node is ok?
> > (2) Copy latest snapshot file and transaction log from version-2
> directory.
> >            Question:
> >                   How to make sure we do not copy corrupt files if the
> > snapshot/transaction log is in the middle of update? Do we have to
> shutdown
> > the node to make the copy?
> >                   besides the transaction log and snapshot, do we have to
> > copy other files such as the ecoch files
> >
> > For the disaster recovery procedures at zookeeper assemble:
> > (1) recreate the machines for the zookeeper ensemble
> > (2) copy snapshot/transaction log we backed up into the zookeeper
> > dataDir\version-2 and logDir\version2.
> >            Question:
> >                  Do we have to copy the epoch files?
> >                  Do we have to copy snapshot/transaction log backed up to
> > all the zookeeper node, or just the first node we starts?
> >
> > Appreciate your time and help.
> > Jack
> >
>