You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Abhishek .E.S" <ab...@gmail.com> on 2013/03/14 17:49:29 UTC

Storing znode on disks

Hi,

I am new to Zookeeper .
I had a question. Zookeeper places znodes in memory to optimize data access.
I am working on an experiment for which I intend to use zookeeper.
For me, latency is acceptable but I require the znodes to be on  disk.

Can this be achieved.
If so, could someone please provide me the pointers for the same.

Thanks and Regards,
Abhishek

RE: Storing znode on disks

Posted by Rakesh R <ra...@huawei.com>.
Hi Abishek,

Could you give more on the data set and the use case in detail.

ZooKeeper is desgined to manage co-ordination data and not designed to be a general database or large object store. Usually the co-ordination data will be relatively small: measured in kilobytes. If the data size is very large, I suggest to use either try splitting the data into multiple znodes(but this again can cause 
lots of problems with watches and atomicity) or try using HDFS/NFS for storing the data. 
But it depends on your use case/requirement.

The ZooKeeper client and the server implementations have sanity checks to ensure that znodes have less data. Also, user can configure znode data size using config 'jute.maxbuffer', by default its 1MB.

-Rakesh
________________________________________
From: Thawan Kooburat [thawan@fb.com]
Sent: Friday, March 15, 2013 1:39 AM
To: dev@zookeeper.apache.org
Subject: Re: Storing znode on disks

This depends on the data size and availability requirement of your use
case.

Ideally, the size of RAM limit the total data size for ZooKeeper. However,
if you store several gigs of data into ZooKeeper, the server load time
will be quite long (minutes) depending on your disk bandwidth. When there
is a leader election, every server need to reload the data from disk into
memory so the quorum is considered unavailable during this period.



--
Thawan Kooburat





On 3/14/13 11:13 AM, "Abhishek .E.S" <ab...@gmail.com> wrote:

>Could I build a large scale data-store using Zookeeper though ?
>
>On Thu, Mar 14, 2013 at 12:52 PM, Edward Ribeiro
><ed...@gmail.com>wrote:
>
>> >> For me, latency is acceptable but I require the znodes to be on
>>disk.
>>
>> Why would you need to do that?
>>
>> ZooKeeper stores the dataTree in memory, but it performs periodic
>>snapshots
>> to disk, besides sync-ing a commit log also to disk, so that a node can
>> recover in case of failures. If you are asking to store znodes *only* in
>> disk then the answer is no (afaik!).
>>
>> Last but not least, you should be aware that znodes are not intended to
>> store large quantities of data, it's not mean to be a database, but a
>> coordination system.
>>
>> Edward
>>
>> On Thu, Mar 14, 2013 at 1:49 PM, Abhishek .E.S <abhishek.es@gmail.com
>> >wrote:
>>
>> > Hi,
>> >
>> > I am new to Zookeeper .
>> > I had a question. Zookeeper places znodes in memory to optimize data
>> > access.
>> > I am working on an experiment for which I intend to use zookeeper.
>> > For me, latency is acceptable but I require the znodes to be on  disk.
>> >
>> > Can this be achieved.
>> > If so, could someone please provide me the pointers for the same.
>> >
>> > Thanks and Regards,
>> > Abhishek
>> >
>>
>>
>>
>> --
>> *"Matar um Leão por dia é fácil. O difícil é desviar das antas.",
>>anônimo*
>>

Re: Storing znode on disks

Posted by "Abhishek .E.S" <ab...@gmail.com>.
Hi,

We intend to do some experimentation with regard to the throughput we can
achieve on our hardware if we used a consensus protocol like Zab. This is
purely for bench-marking and not for a practical deployment. In summary, we
want to use the ZAB protocol for reaching consensus and hence want to
modify Zookeeper implementation for this benchmarking.

Would it be possible to store znodes on disks for this purpose.

-Abhishek

On Fri, Mar 15, 2013 at 5:11 PM, Edward Ribeiro <ed...@gmail.com>wrote:

> On Thu, Mar 14, 2013 at 3:13 PM, Abhishek .E.S <abhishek.es@gmail.com
> >wrote:
>
> > Could I build a large scale data-store using Zookeeper though ?
> >
>
> Yep, but ZooKeeper should be used as the coordinator of this data-store.
> For example, suppose you are building a sharded database. Then, ZK could be
> used to store the *mapping* of data partitions to machines. Or ZK could be
> used to provide high availability of servers so that in case of failure of
> some machines the data-store would still be able to serve requests (this
> scenario is used by HBase, by the way). Or you could use ZK to implement
> some sort of distributed transaction protocol for the data-store (as used
> by the Calvin, a research database system being developed by Yale
> university). As you can see by the cited examples, ZK is used as a
> component of the data-store, but it's in charge of meta-data and
> coordination, not the large scale user data.
>
> Edward
>
>
> >
> > On Thu, Mar 14, 2013 at 12:52 PM, Edward Ribeiro
> > <ed...@gmail.com>wrote:
> >
> > > >> For me, latency is acceptable but I require the znodes to be on
>  disk.
> > >
> > > Why would you need to do that?
> > >
> > > ZooKeeper stores the dataTree in memory, but it performs periodic
> > snapshots
> > > to disk, besides sync-ing a commit log also to disk, so that a node can
> > > recover in case of failures. If you are asking to store znodes *only*
> in
> > > disk then the answer is no (afaik!).
> > >
> > > Last but not least, you should be aware that znodes are not intended to
> > > store large quantities of data, it's not mean to be a database, but a
> > > coordination system.
> > >
> > > Edward
> > >
> > > On Thu, Mar 14, 2013 at 1:49 PM, Abhishek .E.S <abhishek.es@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > I am new to Zookeeper .
> > > > I had a question. Zookeeper places znodes in memory to optimize data
> > > > access.
> > > > I am working on an experiment for which I intend to use zookeeper.
> > > > For me, latency is acceptable but I require the znodes to be on
>  disk.
> > > >
> > > > Can this be achieved.
> > > > If so, could someone please provide me the pointers for the same.
> > > >
> > > > Thanks and Regards,
> > > > Abhishek
> > > >
> > >
> > >
> > >
> > > --
> > > *"Matar um Leão por dia é fácil. O difícil é desviar das antas.",
> > anônimo*
> > >
> >
>
>
>
> --
> *"Matar um Leão por dia é fácil. O difícil é desviar das antas.", anônimo*
>

Re: Storing znode on disks

Posted by Edward Ribeiro <ed...@gmail.com>.
On Thu, Mar 14, 2013 at 3:13 PM, Abhishek .E.S <ab...@gmail.com>wrote:

> Could I build a large scale data-store using Zookeeper though ?
>

Yep, but ZooKeeper should be used as the coordinator of this data-store.
For example, suppose you are building a sharded database. Then, ZK could be
used to store the *mapping* of data partitions to machines. Or ZK could be
used to provide high availability of servers so that in case of failure of
some machines the data-store would still be able to serve requests (this
scenario is used by HBase, by the way). Or you could use ZK to implement
some sort of distributed transaction protocol for the data-store (as used
by the Calvin, a research database system being developed by Yale
university). As you can see by the cited examples, ZK is used as a
component of the data-store, but it's in charge of meta-data and
coordination, not the large scale user data.

Edward


>
> On Thu, Mar 14, 2013 at 12:52 PM, Edward Ribeiro
> <ed...@gmail.com>wrote:
>
> > >> For me, latency is acceptable but I require the znodes to be on  disk.
> >
> > Why would you need to do that?
> >
> > ZooKeeper stores the dataTree in memory, but it performs periodic
> snapshots
> > to disk, besides sync-ing a commit log also to disk, so that a node can
> > recover in case of failures. If you are asking to store znodes *only* in
> > disk then the answer is no (afaik!).
> >
> > Last but not least, you should be aware that znodes are not intended to
> > store large quantities of data, it's not mean to be a database, but a
> > coordination system.
> >
> > Edward
> >
> > On Thu, Mar 14, 2013 at 1:49 PM, Abhishek .E.S <abhishek.es@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I am new to Zookeeper .
> > > I had a question. Zookeeper places znodes in memory to optimize data
> > > access.
> > > I am working on an experiment for which I intend to use zookeeper.
> > > For me, latency is acceptable but I require the znodes to be on  disk.
> > >
> > > Can this be achieved.
> > > If so, could someone please provide me the pointers for the same.
> > >
> > > Thanks and Regards,
> > > Abhishek
> > >
> >
> >
> >
> > --
> > *"Matar um Leão por dia é fácil. O difícil é desviar das antas.",
> anônimo*
> >
>



-- 
*"Matar um Leão por dia é fácil. O difícil é desviar das antas.", anônimo*

Re: Storing znode on disks

Posted by Thawan Kooburat <th...@fb.com>.
This depends on the data size and availability requirement of your use
case.

Ideally, the size of RAM limit the total data size for ZooKeeper. However,
if you store several gigs of data into ZooKeeper, the server load time
will be quite long (minutes) depending on your disk bandwidth. When there
is a leader election, every server need to reload the data from disk into
memory so the quorum is considered unavailable during this period.

 

-- 
Thawan Kooburat





On 3/14/13 11:13 AM, "Abhishek .E.S" <ab...@gmail.com> wrote:

>Could I build a large scale data-store using Zookeeper though ?
>
>On Thu, Mar 14, 2013 at 12:52 PM, Edward Ribeiro
><ed...@gmail.com>wrote:
>
>> >> For me, latency is acceptable but I require the znodes to be on
>>disk.
>>
>> Why would you need to do that?
>>
>> ZooKeeper stores the dataTree in memory, but it performs periodic
>>snapshots
>> to disk, besides sync-ing a commit log also to disk, so that a node can
>> recover in case of failures. If you are asking to store znodes *only* in
>> disk then the answer is no (afaik!).
>>
>> Last but not least, you should be aware that znodes are not intended to
>> store large quantities of data, it's not mean to be a database, but a
>> coordination system.
>>
>> Edward
>>
>> On Thu, Mar 14, 2013 at 1:49 PM, Abhishek .E.S <abhishek.es@gmail.com
>> >wrote:
>>
>> > Hi,
>> >
>> > I am new to Zookeeper .
>> > I had a question. Zookeeper places znodes in memory to optimize data
>> > access.
>> > I am working on an experiment for which I intend to use zookeeper.
>> > For me, latency is acceptable but I require the znodes to be on  disk.
>> >
>> > Can this be achieved.
>> > If so, could someone please provide me the pointers for the same.
>> >
>> > Thanks and Regards,
>> > Abhishek
>> >
>>
>>
>>
>> --
>> *"Matar um Leão por dia é fácil. O difícil é desviar das antas.",
>>anônimo*
>>


Re: Storing znode on disks

Posted by "Abhishek .E.S" <ab...@gmail.com>.
Could I build a large scale data-store using Zookeeper though ?

On Thu, Mar 14, 2013 at 12:52 PM, Edward Ribeiro
<ed...@gmail.com>wrote:

> >> For me, latency is acceptable but I require the znodes to be on  disk.
>
> Why would you need to do that?
>
> ZooKeeper stores the dataTree in memory, but it performs periodic snapshots
> to disk, besides sync-ing a commit log also to disk, so that a node can
> recover in case of failures. If you are asking to store znodes *only* in
> disk then the answer is no (afaik!).
>
> Last but not least, you should be aware that znodes are not intended to
> store large quantities of data, it's not mean to be a database, but a
> coordination system.
>
> Edward
>
> On Thu, Mar 14, 2013 at 1:49 PM, Abhishek .E.S <abhishek.es@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am new to Zookeeper .
> > I had a question. Zookeeper places znodes in memory to optimize data
> > access.
> > I am working on an experiment for which I intend to use zookeeper.
> > For me, latency is acceptable but I require the znodes to be on  disk.
> >
> > Can this be achieved.
> > If so, could someone please provide me the pointers for the same.
> >
> > Thanks and Regards,
> > Abhishek
> >
>
>
>
> --
> *"Matar um Leão por dia é fácil. O difícil é desviar das antas.", anônimo*
>

Re: Storing znode on disks

Posted by Edward Ribeiro <ed...@gmail.com>.
>> For me, latency is acceptable but I require the znodes to be on  disk.

Why would you need to do that?

ZooKeeper stores the dataTree in memory, but it performs periodic snapshots
to disk, besides sync-ing a commit log also to disk, so that a node can
recover in case of failures. If you are asking to store znodes *only* in
disk then the answer is no (afaik!).

Last but not least, you should be aware that znodes are not intended to
store large quantities of data, it's not mean to be a database, but a
coordination system.

Edward

On Thu, Mar 14, 2013 at 1:49 PM, Abhishek .E.S <ab...@gmail.com>wrote:

> Hi,
>
> I am new to Zookeeper .
> I had a question. Zookeeper places znodes in memory to optimize data
> access.
> I am working on an experiment for which I intend to use zookeeper.
> For me, latency is acceptable but I require the znodes to be on  disk.
>
> Can this be achieved.
> If so, could someone please provide me the pointers for the same.
>
> Thanks and Regards,
> Abhishek
>



-- 
*"Matar um Leão por dia é fácil. O difícil é desviar das antas.", anônimo*