You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Ertio Lew <er...@gmail.com> on 2011/02/25 10:50:10 UTC

Zookeeper for generating sequential IDs

Hi all,

I am involved in a project where we're building a social application
using Cassandra DB and Java. I am looking for a solution to generate
unique sequential IDs for the content on the application. I have been
suggested by some people to have a look  to Zookeeper for this. I
would highly appreciate if anyone can suggest if zookeeper is suitable
for this purpose and any good resources to gain information about
zookeeper.

Since the application is based on a eventually consistent distributed
platform using Cassandra, we have felt a need to look over to other
solutions instead of building our own using our DB.

Any kind of comments, suggestions are highly welcomed! :)

Regards
Ertio Lew.

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Thanks Andrew, however I would prefer to stay away from supercolumns
because of their well known limitations.

Regarding the snowflake I think I can make it useful for me by
limiting the currently 12 bits sequence no. to 8 bits and using the
saved up 4 bits to store the category of data. Thus I would be
reducing the theoritical limit of 4096 ids per millisecond per machine
to 256 ids per ms per machine. Sounds too good for my use case..

@Jeff,  Would you like to say something on this idea ??

Thank you all..
Ertio



On Tue, Mar 1, 2011 at 6:44 AM, Andrew Ebaugh <ae...@real.com> wrote:
> Getting a bit into Cassandra weeds, but what about using a super column
> and TimeUUIDType keys? IMO splitting data for one unique item into
> multiple manipulated keys sounds complex, and more what a super column was
> made for.
>
> So instead of having:
>
> TimeIDA-Name -> {name column data}
> TimeIDA-Blah -> {blah column data}
> TimdIDB-Name -> ...
>
> You'd have :
>
> TimeIDA ->
>  {
>  Name -> {name column data}
>  Blah -> {blah column data}
>  }
>
> TimeIDB ->
>  ...
>
>
> This would give you the advantage of being able to query key slices based
> on time ranges.
> Here's a good article (seems a bit outdated for 0.7):
> http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
>
>
>
> On 2/28/11 9:50 AM, "Ertio Lew" <er...@gmail.com> wrote:
>
>>Thanks Jeff !
>>
>>Your point is truly valid! However... even my idea is "not to store
>>information about the data/entities in the Id" but to split the
>>several data of an entity into several rows(according to category of
>>that data) in same CF in Cassandra.
>>So for e.g. if you want to split the information about a tweet in two
>>rows according to the 'type of information', then you want two keys
>>generated using the same ID.
>>
>>For this purpose you definitely need to have some kind of manipulation
>>required with your Ids. Or otherwise you cannot split the data for a
>>particular entity (in same CF) in two rows, according to data
>>category. Of course you can also suggest to store different types of
>>data in different CFs but sometimes it is more optimal to keep a limit
>>on the no of CFs in Cassandra.
>>
>>Regards
>>Ertio Lew
>>
>>
>>
>>On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jh...@twitter.com> wrote:
>>> Also, feel free to mock me for the phrase "identifying id".
>>>
>>> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jh...@twitter.com>
>>>wrote:
>>>> If you patch snowflake to remove 4 bits from the timestamp section,
>>>> you will take the time that it takes before the IDs generated overflow
>>>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>>>> little over 4 years (2 ** 37 milliseconds). This is likely
>>>> unacceptable for your use case.
>>>>
>>>> However, the larger point to discuss is that encoding additional
>>>> information about your data in the identifying id is, in general, a
>>>> bad idea. It means your architecture is strictly coupled to your
>>>> current and likely less-than-perfect understanding of the problem and
>>>> makes it harder to iterate towards a better one. For instance, we had
>>>> to rewrite certain parts of our search infrastructure when migrating
>>>> to snowflake because it had assumed that the generated id space of
>>>> tweets was uniform across time.
>>>>
>>>> But, of course, I'm just some dude on the internet who doesn't know
>>>> your particular problem or design in detail. God speed and good luck.
>>>>
>>>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <er...@gmail.com> wrote:
>>>>> Yes I think we could perhaps reduce the micro seconds precision
>>>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>>>> needs.
>>>>>
>>>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <te...@gmail.com>
>>>>>wrote:
>>>>>> So patch it!
>>>>>>
>>>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com>
>>>>>>wrote:
>>>>>>
>>>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>>>> workerId and noOfGeneratedIds.
>>>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this
>>>>>>>ID
>>>>>>> then I risk* that it might overwrite the already existing ID created
>>>>>>> by it.
>>>>>>>
>>>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
>>>>>>> wrote:
>>>>>>> > Uh.... any sequential generator that starts at zero will take a
>>>>>>>LONG time
>>>>>>> > until it generates a value > 2^60.
>>>>>>> >
>>>>>>> > If you generator a million id's per second (= 2^20) then it will
>>>>>>>be
>>>>>>> longer
>>>>>>> > than 30,000 years before you get past 2^60.
>>>>>>> >
>>>>>>> > Is this *really* a problem?
>>>>>>> >
>>>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com>
>>>>>>>wrote:
>>>>>>> >
>>>>>>> >> Could you recommend any other ID generator that could help me
>>>>>>>with
>>>>>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>>>>>> >>
>>>>>>> >> Thanks
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com>
>>>>>>>wrote:
>>>>>>> >> > Thanks Patrick,
>>>>>>> >> >
>>>>>>> >> > I considered your suggestion. But sadly it could not fit my
>>>>>>>use case.
>>>>>>> >> > I am looking for a solution that could help me generate 64
>>>>>>>bits Ids
>>>>>>> >> > but in those 64 bits I would like atleast 4 free bits so that
>>>>>>>I could
>>>>>>> >> > manage with those free bits to distinguish the type of data
>>>>>>>for a
>>>>>>> >> > particular entity in the same columnfamily.
>>>>>>> >> >
>>>>>>> >> > If I could keep the snowflake's Id size to around 60 bits,
>>>>>>>that would
>>>>>>> >> > have been great..
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt
>>>>>>><ph...@apache.org>
>>>>>>> wrote:
>>>>>>> >> >> Keep in mind that blog post is pretty old. I see comments
>>>>>>>like this
>>>>>>> in
>>>>>>> >> >> the commit log
>>>>>>> >> >>
>>>>>>> >> >> "hard to call it alpha/experimental after serving billions of
>>>>>>>ids"
>>>>>>> >> >>
>>>>>>> >> >> so it seems it's in production at twitter at least...
>>>>>>> >> >>
>>>>>>> >> >> Patrick
>>>>>>> >> >>
>>>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew
>>>>>>><er...@gmail.com>
>>>>>>> wrote:
>>>>>>> >> >>> Thanks Patrick,
>>>>>>> >> >>>
>>>>>>> >> >>> The fact that it is still in the alpha stage and twitter is
>>>>>>>not yet
>>>>>>> >> >>> using it, makes me look to other solutions as well, which
>>>>>>>have a
>>>>>>> large
>>>>>>> >> >>> community/users base & are more mature.
>>>>>>> >> >>>
>>>>>>> >> >>> I do not know much about the snowflake if it is being used in
>>>>>>> >> >>> production by anyone ..
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>>
>>>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt
>>>>>>><ph...@apache.org>
>>>>>>> >> wrote:
>>>>>>> >> >>>> Have you looked at snowflake?
>>>>>>> >> >>>>
>>>>>>> >> >>>>
>>>>>>>http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>>>> >> >>>>
>>>>>>> >> >>>> Patrick
>>>>>>> >> >>>>
>>>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>>>>> ted.dunning@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>>>>>> generation
>>>>>>> >> rate
>>>>>>> >> >>>>> is less than a few thousand per second, ZK is a fine
>>>>>>>choice.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> To get very high generation rates, what is typically done
>>>>>>>is to
>>>>>>> >> allocate
>>>>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>>>>>> locally.
>>>>>>> >>  This
>>>>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id
>>>>>>>space
>>>>>>> and
>>>>>>> >> it means
>>>>>>> >> >>>>> that the ordering of id's only approximates the time
>>>>>>>ordering of
>>>>>>> when
>>>>>>> >> the
>>>>>>> >> >>>>> id's were assigned.  Neither of these is typically a
>>>>>>>problem.
>>>>>>> >> >>>>>
>>>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew
>>>>>>><er...@gmail.com>
>>>>>>> >> wrote:
>>>>>>> >> >>>>>
>>>>>>> >> >>>>>> Hi all,
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> I am involved in a project where we're building a social
>>>>>>> application
>>>>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution
>>>>>>>to
>>>>>>> generate
>>>>>>> >> >>>>>> unique sequential IDs for the content on the application.
>>>>>>>I have
>>>>>>> >> been
>>>>>>> >> >>>>>> suggested by some people to have a look  to Zookeeper for
>>>>>>>this. I
>>>>>>> >> >>>>>> would highly appreciate if anyone can suggest if
>>>>>>>zookeeper is
>>>>>>> >> suitable
>>>>>>> >> >>>>>> for this purpose and any good resources to gain
>>>>>>>information about
>>>>>>> >> >>>>>> zookeeper.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Since the application is based on a eventually consistent
>>>>>>> >> distributed
>>>>>>> >> >>>>>> platform using Cassandra, we have felt a need to look
>>>>>>>over to
>>>>>>> other
>>>>>>> >> >>>>>> solutions instead of building our own using our DB.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>> Regards
>>>>>>> >> >>>>>> Ertio Lew.
>>>>>>> >> >>>>>>
>>>>>>> >> >>>>>
>>>>>>> >> >>>>
>>>>>>> >> >>>
>>>>>>> >> >>
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>

Re: Zookeeper for generating sequential IDs

Posted by Andrew Ebaugh <ae...@real.com>.

Getting a bit into Cassandra weeds, but what about using a super column
and TimeUUIDType keys? IMO splitting data for one unique item into
multiple manipulated keys sounds complex, and more what a super column was
made for.

So instead of having:

TimeIDA-Name -> {name column data}
TimeIDA-Blah -> {blah column data}
TimdIDB-Name -> ...

You'd have :

TimeIDA ->
 {
  Name -> {name column data}
  Blah -> {blah column data}
 }

TimeIDB ->
  ...


This would give you the advantage of being able to query key slices based
on time ranges.
Here's a good article (seems a bit outdated for 0.7):
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model



On 2/28/11 9:50 AM, "Ertio Lew" <er...@gmail.com> wrote:

>Thanks Jeff !
>
>Your point is truly valid! However... even my idea is "not to store
>information about the data/entities in the Id" but to split the
>several data of an entity into several rows(according to category of
>that data) in same CF in Cassandra.
>So for e.g. if you want to split the information about a tweet in two
>rows according to the 'type of information', then you want two keys
>generated using the same ID.
>
>For this purpose you definitely need to have some kind of manipulation
>required with your Ids. Or otherwise you cannot split the data for a
>particular entity (in same CF) in two rows, according to data
>category. Of course you can also suggest to store different types of
>data in different CFs but sometimes it is more optimal to keep a limit
>on the no of CFs in Cassandra.
>
>Regards
>Ertio Lew
>
>
>
>On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jh...@twitter.com> wrote:
>> Also, feel free to mock me for the phrase "identifying id".
>>
>> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jh...@twitter.com>
>>wrote:
>>> If you patch snowflake to remove 4 bits from the timestamp section,
>>> you will take the time that it takes before the IDs generated overflow
>>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>>> little over 4 years (2 ** 37 milliseconds). This is likely
>>> unacceptable for your use case.
>>>
>>> However, the larger point to discuss is that encoding additional
>>> information about your data in the identifying id is, in general, a
>>> bad idea. It means your architecture is strictly coupled to your
>>> current and likely less-than-perfect understanding of the problem and
>>> makes it harder to iterate towards a better one. For instance, we had
>>> to rewrite certain parts of our search infrastructure when migrating
>>> to snowflake because it had assumed that the generated id space of
>>> tweets was uniform across time.
>>>
>>> But, of course, I'm just some dude on the internet who doesn't know
>>> your particular problem or design in detail. God speed and good luck.
>>>
>>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <er...@gmail.com> wrote:
>>>> Yes I think we could perhaps reduce the micro seconds precision
>>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>>> needs.
>>>>
>>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <te...@gmail.com>
>>>>wrote:
>>>>> So patch it!
>>>>>
>>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com>
>>>>>wrote:
>>>>>
>>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>>> workerId and noOfGeneratedIds.
>>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this
>>>>>>ID
>>>>>> then I risk* that it might overwrite the already existing ID created
>>>>>> by it.
>>>>>>
>>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
>>>>>> wrote:
>>>>>> > Uh.... any sequential generator that starts at zero will take a
>>>>>>LONG time
>>>>>> > until it generates a value > 2^60.
>>>>>> >
>>>>>> > If you generator a million id's per second (= 2^20) then it will
>>>>>>be
>>>>>> longer
>>>>>> > than 30,000 years before you get past 2^60.
>>>>>> >
>>>>>> > Is this *really* a problem?
>>>>>> >
>>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com>
>>>>>>wrote:
>>>>>> >
>>>>>> >> Could you recommend any other ID generator that could help me
>>>>>>with
>>>>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>>>>> >>
>>>>>> >> Thanks
>>>>>> >>
>>>>>> >>
>>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com>
>>>>>>wrote:
>>>>>> >> > Thanks Patrick,
>>>>>> >> >
>>>>>> >> > I considered your suggestion. But sadly it could not fit my
>>>>>>use case.
>>>>>> >> > I am looking for a solution that could help me generate 64
>>>>>>bits Ids
>>>>>> >> > but in those 64 bits I would like atleast 4 free bits so that
>>>>>>I could
>>>>>> >> > manage with those free bits to distinguish the type of data
>>>>>>for a
>>>>>> >> > particular entity in the same columnfamily.
>>>>>> >> >
>>>>>> >> > If I could keep the snowflake's Id size to around 60 bits,
>>>>>>that would
>>>>>> >> > have been great..
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt
>>>>>><ph...@apache.org>
>>>>>> wrote:
>>>>>> >> >> Keep in mind that blog post is pretty old. I see comments
>>>>>>like this
>>>>>> in
>>>>>> >> >> the commit log
>>>>>> >> >>
>>>>>> >> >> "hard to call it alpha/experimental after serving billions of
>>>>>>ids"
>>>>>> >> >>
>>>>>> >> >> so it seems it's in production at twitter at least...
>>>>>> >> >>
>>>>>> >> >> Patrick
>>>>>> >> >>
>>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew
>>>>>><er...@gmail.com>
>>>>>> wrote:
>>>>>> >> >>> Thanks Patrick,
>>>>>> >> >>>
>>>>>> >> >>> The fact that it is still in the alpha stage and twitter is
>>>>>>not yet
>>>>>> >> >>> using it, makes me look to other solutions as well, which
>>>>>>have a
>>>>>> large
>>>>>> >> >>> community/users base & are more mature.
>>>>>> >> >>>
>>>>>> >> >>> I do not know much about the snowflake if it is being used in
>>>>>> >> >>> production by anyone ..
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt
>>>>>><ph...@apache.org>
>>>>>> >> wrote:
>>>>>> >> >>>> Have you looked at snowflake?
>>>>>> >> >>>>
>>>>>> >> >>>> 
>>>>>>http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>>> >> >>>>
>>>>>> >> >>>> Patrick
>>>>>> >> >>>>
>>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>>>> ted.dunning@gmail.com>
>>>>>> >> wrote:
>>>>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>>>>> generation
>>>>>> >> rate
>>>>>> >> >>>>> is less than a few thousand per second, ZK is a fine
>>>>>>choice.
>>>>>> >> >>>>>
>>>>>> >> >>>>> To get very high generation rates, what is typically done
>>>>>>is to
>>>>>> >> allocate
>>>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>>>>> locally.
>>>>>> >>  This
>>>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id
>>>>>>space
>>>>>> and
>>>>>> >> it means
>>>>>> >> >>>>> that the ordering of id's only approximates the time
>>>>>>ordering of
>>>>>> when
>>>>>> >> the
>>>>>> >> >>>>> id's were assigned.  Neither of these is typically a
>>>>>>problem.
>>>>>> >> >>>>>
>>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew
>>>>>><er...@gmail.com>
>>>>>> >> wrote:
>>>>>> >> >>>>>
>>>>>> >> >>>>>> Hi all,
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> I am involved in a project where we're building a social
>>>>>> application
>>>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution
>>>>>>to
>>>>>> generate
>>>>>> >> >>>>>> unique sequential IDs for the content on the application.
>>>>>>I have
>>>>>> >> been
>>>>>> >> >>>>>> suggested by some people to have a look  to Zookeeper for
>>>>>>this. I
>>>>>> >> >>>>>> would highly appreciate if anyone can suggest if
>>>>>>zookeeper is
>>>>>> >> suitable
>>>>>> >> >>>>>> for this purpose and any good resources to gain
>>>>>>information about
>>>>>> >> >>>>>> zookeeper.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Since the application is based on a eventually consistent
>>>>>> >> distributed
>>>>>> >> >>>>>> platform using Cassandra, we have felt a need to look
>>>>>>over to
>>>>>> other
>>>>>> >> >>>>>> solutions instead of building our own using our DB.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Regards
>>>>>> >> >>>>>> Ertio Lew.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Thanks Jeff !

Your point is truly valid! However... even my idea is "not to store
information about the data/entities in the Id" but to split the
several data of an entity into several rows(according to category of
that data) in same CF in Cassandra.
So for e.g. if you want to split the information about a tweet in two
rows according to the 'type of information', then you want two keys
generated using the same ID.

For this purpose you definitely need to have some kind of manipulation
required with your Ids. Or otherwise you cannot split the data for a
particular entity (in same CF) in two rows, according to data
category. Of course you can also suggest to store different types of
data in different CFs but sometimes it is more optimal to keep a limit
on the no of CFs in Cassandra.

Regards
Ertio Lew



On Mon, Feb 28, 2011 at 10:50 PM, Jeff Hodges <jh...@twitter.com> wrote:
> Also, feel free to mock me for the phrase "identifying id".
>
> On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jh...@twitter.com> wrote:
>> If you patch snowflake to remove 4 bits from the timestamp section,
>> you will take the time that it takes before the IDs generated overflow
>> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
>> little over 4 years (2 ** 37 milliseconds). This is likely
>> unacceptable for your use case.
>>
>> However, the larger point to discuss is that encoding additional
>> information about your data in the identifying id is, in general, a
>> bad idea. It means your architecture is strictly coupled to your
>> current and likely less-than-perfect understanding of the problem and
>> makes it harder to iterate towards a better one. For instance, we had
>> to rewrite certain parts of our search infrastructure when migrating
>> to snowflake because it had assumed that the generated id space of
>> tweets was uniform across time.
>>
>> But, of course, I'm just some dude on the internet who doesn't know
>> your particular problem or design in detail. God speed and good luck.
>>
>> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <er...@gmail.com> wrote:
>>> Yes I think we could perhaps reduce the micro seconds precision
>>> provided by it(I think 41 bits) to an appropriate extent to match our
>>> needs.
>>>
>>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>>> So patch it!
>>>>
>>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com> wrote:
>>>>
>>>>> First that it does not start at 0 since it comprises timestamp,
>>>>> workerId and noOfGeneratedIds.
>>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>>>>> then I risk* that it might overwrite the already existing ID created
>>>>> by it.
>>>>>
>>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
>>>>> wrote:
>>>>> > Uh.... any sequential generator that starts at zero will take a LONG time
>>>>> > until it generates a value > 2^60.
>>>>> >
>>>>> > If you generator a million id's per second (= 2^20) then it will be
>>>>> longer
>>>>> > than 30,000 years before you get past 2^60.
>>>>> >
>>>>> > Is this *really* a problem?
>>>>> >
>>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:
>>>>> >
>>>>> >> Could you recommend any other ID generator that could help me with
>>>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
>>>>> >> > Thanks Patrick,
>>>>> >> >
>>>>> >> > I considered your suggestion. But sadly it could not fit my use case.
>>>>> >> > I am looking for a solution that could help me generate 64 bits Ids
>>>>> >> > but in those 64 bits I would like atleast 4 free bits so that I could
>>>>> >> > manage with those free bits to distinguish the type of data for a
>>>>> >> > particular entity in the same columnfamily.
>>>>> >> >
>>>>> >> > If I could keep the snowflake's Id size to around 60 bits, that would
>>>>> >> > have been great..
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org>
>>>>> wrote:
>>>>> >> >> Keep in mind that blog post is pretty old. I see comments like this
>>>>> in
>>>>> >> >> the commit log
>>>>> >> >>
>>>>> >> >> "hard to call it alpha/experimental after serving billions of ids"
>>>>> >> >>
>>>>> >> >> so it seems it's in production at twitter at least...
>>>>> >> >>
>>>>> >> >> Patrick
>>>>> >> >>
>>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com>
>>>>> wrote:
>>>>> >> >>> Thanks Patrick,
>>>>> >> >>>
>>>>> >> >>> The fact that it is still in the alpha stage and twitter is not yet
>>>>> >> >>> using it, makes me look to other solutions as well, which have a
>>>>> large
>>>>> >> >>> community/users base & are more mature.
>>>>> >> >>>
>>>>> >> >>> I do not know much about the snowflake if it is being used in
>>>>> >> >>> production by anyone ..
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
>>>>> >> wrote:
>>>>> >> >>>> Have you looked at snowflake?
>>>>> >> >>>>
>>>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>> >> >>>>
>>>>> >> >>>> Patrick
>>>>> >> >>>>
>>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>>> ted.dunning@gmail.com>
>>>>> >> wrote:
>>>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>>>> generation
>>>>> >> rate
>>>>> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
>>>>> >> >>>>>
>>>>> >> >>>>> To get very high generation rates, what is typically done is to
>>>>> >> allocate
>>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>>>> locally.
>>>>> >>  This
>>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id space
>>>>> and
>>>>> >> it means
>>>>> >> >>>>> that the ordering of id's only approximates the time ordering of
>>>>> when
>>>>> >> the
>>>>> >> >>>>> id's were assigned.  Neither of these is typically a problem.
>>>>> >> >>>>>
>>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
>>>>> >> wrote:
>>>>> >> >>>>>
>>>>> >> >>>>>> Hi all,
>>>>> >> >>>>>>
>>>>> >> >>>>>> I am involved in a project where we're building a social
>>>>> application
>>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
>>>>> generate
>>>>> >> >>>>>> unique sequential IDs for the content on the application. I have
>>>>> >> been
>>>>> >> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
>>>>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
>>>>> >> suitable
>>>>> >> >>>>>> for this purpose and any good resources to gain information about
>>>>> >> >>>>>> zookeeper.
>>>>> >> >>>>>>
>>>>> >> >>>>>> Since the application is based on a eventually consistent
>>>>> >> distributed
>>>>> >> >>>>>> platform using Cassandra, we have felt a need to look over to
>>>>> other
>>>>> >> >>>>>> solutions instead of building our own using our DB.
>>>>> >> >>>>>>
>>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>> >> >>>>>>
>>>>> >> >>>>>> Regards
>>>>> >> >>>>>> Ertio Lew.
>>>>> >> >>>>>>
>>>>> >> >>>>>
>>>>> >> >>>>
>>>>> >> >>>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Jeff Hodges <jh...@twitter.com>.

Also, feel free to mock me for the phrase "identifying id".

On Mon, Feb 28, 2011 at 9:04 AM, Jeff Hodges <jh...@twitter.com> wrote:
> If you patch snowflake to remove 4 bits from the timestamp section,
> you will take the time that it takes before the IDs generated overflow
> the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
> little over 4 years (2 ** 37 milliseconds). This is likely
> unacceptable for your use case.
>
> However, the larger point to discuss is that encoding additional
> information about your data in the identifying id is, in general, a
> bad idea. It means your architecture is strictly coupled to your
> current and likely less-than-perfect understanding of the problem and
> makes it harder to iterate towards a better one. For instance, we had
> to rewrite certain parts of our search infrastructure when migrating
> to snowflake because it had assumed that the generated id space of
> tweets was uniform across time.
>
> But, of course, I'm just some dude on the internet who doesn't know
> your particular problem or design in detail. God speed and good luck.
>
> On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <er...@gmail.com> wrote:
>> Yes I think we could perhaps reduce the micro seconds precision
>> provided by it(I think 41 bits) to an appropriate extent to match our
>> needs.
>>
>> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <te...@gmail.com> wrote:
>>> So patch it!
>>>
>>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com> wrote:
>>>
>>>> First that it does not start at 0 since it comprises timestamp,
>>>> workerId and noOfGeneratedIds.
>>>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>>>> then I risk* that it might overwrite the already existing ID created
>>>> by it.
>>>>
>>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
>>>> wrote:
>>>> > Uh.... any sequential generator that starts at zero will take a LONG time
>>>> > until it generates a value > 2^60.
>>>> >
>>>> > If you generator a million id's per second (= 2^20) then it will be
>>>> longer
>>>> > than 30,000 years before you get past 2^60.
>>>> >
>>>> > Is this *really* a problem?
>>>> >
>>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:
>>>> >
>>>> >> Could you recommend any other ID generator that could help me with
>>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >>
>>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
>>>> >> > Thanks Patrick,
>>>> >> >
>>>> >> > I considered your suggestion. But sadly it could not fit my use case.
>>>> >> > I am looking for a solution that could help me generate 64 bits Ids
>>>> >> > but in those 64 bits I would like atleast 4 free bits so that I could
>>>> >> > manage with those free bits to distinguish the type of data for a
>>>> >> > particular entity in the same columnfamily.
>>>> >> >
>>>> >> > If I could keep the snowflake's Id size to around 60 bits, that would
>>>> >> > have been great..
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org>
>>>> wrote:
>>>> >> >> Keep in mind that blog post is pretty old. I see comments like this
>>>> in
>>>> >> >> the commit log
>>>> >> >>
>>>> >> >> "hard to call it alpha/experimental after serving billions of ids"
>>>> >> >>
>>>> >> >> so it seems it's in production at twitter at least...
>>>> >> >>
>>>> >> >> Patrick
>>>> >> >>
>>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com>
>>>> wrote:
>>>> >> >>> Thanks Patrick,
>>>> >> >>>
>>>> >> >>> The fact that it is still in the alpha stage and twitter is not yet
>>>> >> >>> using it, makes me look to other solutions as well, which have a
>>>> large
>>>> >> >>> community/users base & are more mature.
>>>> >> >>>
>>>> >> >>> I do not know much about the snowflake if it is being used in
>>>> >> >>> production by anyone ..
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
>>>> >> wrote:
>>>> >> >>>> Have you looked at snowflake?
>>>> >> >>>>
>>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>> >> >>>>
>>>> >> >>>> Patrick
>>>> >> >>>>
>>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>>> ted.dunning@gmail.com>
>>>> >> wrote:
>>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>>> generation
>>>> >> rate
>>>> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
>>>> >> >>>>>
>>>> >> >>>>> To get very high generation rates, what is typically done is to
>>>> >> allocate
>>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>>> locally.
>>>> >>  This
>>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id space
>>>> and
>>>> >> it means
>>>> >> >>>>> that the ordering of id's only approximates the time ordering of
>>>> when
>>>> >> the
>>>> >> >>>>> id's were assigned.  Neither of these is typically a problem.
>>>> >> >>>>>
>>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
>>>> >> wrote:
>>>> >> >>>>>
>>>> >> >>>>>> Hi all,
>>>> >> >>>>>>
>>>> >> >>>>>> I am involved in a project where we're building a social
>>>> application
>>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
>>>> generate
>>>> >> >>>>>> unique sequential IDs for the content on the application. I have
>>>> >> been
>>>> >> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
>>>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
>>>> >> suitable
>>>> >> >>>>>> for this purpose and any good resources to gain information about
>>>> >> >>>>>> zookeeper.
>>>> >> >>>>>>
>>>> >> >>>>>> Since the application is based on a eventually consistent
>>>> >> distributed
>>>> >> >>>>>> platform using Cassandra, we have felt a need to look over to
>>>> other
>>>> >> >>>>>> solutions instead of building our own using our DB.
>>>> >> >>>>>>
>>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>> >> >>>>>>
>>>> >> >>>>>> Regards
>>>> >> >>>>>> Ertio Lew.
>>>> >> >>>>>>
>>>> >> >>>>>
>>>> >> >>>>
>>>> >> >>>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Jeff Hodges <jh...@twitter.com>.

If you patch snowflake to remove 4 bits from the timestamp section,
you will take the time that it takes before the IDs generated overflow
the JVM 63-bit limit from about 70 years (2 ** 41 milliseconds) to a
little over 4 years (2 ** 37 milliseconds). This is likely
unacceptable for your use case.

However, the larger point to discuss is that encoding additional
information about your data in the identifying id is, in general, a
bad idea. It means your architecture is strictly coupled to your
current and likely less-than-perfect understanding of the problem and
makes it harder to iterate towards a better one. For instance, we had
to rewrite certain parts of our search infrastructure when migrating
to snowflake because it had assumed that the generated id space of
tweets was uniform across time.

But, of course, I'm just some dude on the internet who doesn't know
your particular problem or design in detail. God speed and good luck.

On Mon, Feb 28, 2011 at 8:35 AM, Ertio Lew <er...@gmail.com> wrote:
> Yes I think we could perhaps reduce the micro seconds precision
> provided by it(I think 41 bits) to an appropriate extent to match our
> needs.
>
> On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <te...@gmail.com> wrote:
>> So patch it!
>>
>> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com> wrote:
>>
>>> First that it does not start at 0 since it comprises timestamp,
>>> workerId and noOfGeneratedIds.
>>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>>> then I risk* that it might overwrite the already existing ID created
>>> by it.
>>>
>>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>> > Uh.... any sequential generator that starts at zero will take a LONG time
>>> > until it generates a value > 2^60.
>>> >
>>> > If you generator a million id's per second (= 2^20) then it will be
>>> longer
>>> > than 30,000 years before you get past 2^60.
>>> >
>>> > Is this *really* a problem?
>>> >
>>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:
>>> >
>>> >> Could you recommend any other ID generator that could help me with
>>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>> >>
>>> >> Thanks
>>> >>
>>> >>
>>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
>>> >> > Thanks Patrick,
>>> >> >
>>> >> > I considered your suggestion. But sadly it could not fit my use case.
>>> >> > I am looking for a solution that could help me generate 64 bits Ids
>>> >> > but in those 64 bits I would like atleast 4 free bits so that I could
>>> >> > manage with those free bits to distinguish the type of data for a
>>> >> > particular entity in the same columnfamily.
>>> >> >
>>> >> > If I could keep the snowflake's Id size to around 60 bits, that would
>>> >> > have been great..
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org>
>>> wrote:
>>> >> >> Keep in mind that blog post is pretty old. I see comments like this
>>> in
>>> >> >> the commit log
>>> >> >>
>>> >> >> "hard to call it alpha/experimental after serving billions of ids"
>>> >> >>
>>> >> >> so it seems it's in production at twitter at least...
>>> >> >>
>>> >> >> Patrick
>>> >> >>
>>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com>
>>> wrote:
>>> >> >>> Thanks Patrick,
>>> >> >>>
>>> >> >>> The fact that it is still in the alpha stage and twitter is not yet
>>> >> >>> using it, makes me look to other solutions as well, which have a
>>> large
>>> >> >>> community/users base & are more mature.
>>> >> >>>
>>> >> >>> I do not know much about the snowflake if it is being used in
>>> >> >>> production by anyone ..
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
>>> >> wrote:
>>> >> >>>> Have you looked at snowflake?
>>> >> >>>>
>>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>> >> >>>>
>>> >> >>>> Patrick
>>> >> >>>>
>>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>>> ted.dunning@gmail.com>
>>> >> wrote:
>>> >> >>>>> If your id's don't need to be exactly sequential or if the
>>> generation
>>> >> rate
>>> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
>>> >> >>>>>
>>> >> >>>>> To get very high generation rates, what is typically done is to
>>> >> allocate
>>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>>> locally.
>>> >>  This
>>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id space
>>> and
>>> >> it means
>>> >> >>>>> that the ordering of id's only approximates the time ordering of
>>> when
>>> >> the
>>> >> >>>>> id's were assigned.  Neither of these is typically a problem.
>>> >> >>>>>
>>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
>>> >> wrote:
>>> >> >>>>>
>>> >> >>>>>> Hi all,
>>> >> >>>>>>
>>> >> >>>>>> I am involved in a project where we're building a social
>>> application
>>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
>>> generate
>>> >> >>>>>> unique sequential IDs for the content on the application. I have
>>> >> been
>>> >> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
>>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
>>> >> suitable
>>> >> >>>>>> for this purpose and any good resources to gain information about
>>> >> >>>>>> zookeeper.
>>> >> >>>>>>
>>> >> >>>>>> Since the application is based on a eventually consistent
>>> >> distributed
>>> >> >>>>>> platform using Cassandra, we have felt a need to look over to
>>> other
>>> >> >>>>>> solutions instead of building our own using our DB.
>>> >> >>>>>>
>>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>> >> >>>>>>
>>> >> >>>>>> Regards
>>> >> >>>>>> Ertio Lew.
>>> >> >>>>>>
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >>
>>> >
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Yes I think we could perhaps reduce the micro seconds precision
provided by it(I think 41 bits) to an appropriate extent to match our
needs.

On Mon, Feb 28, 2011 at 9:38 PM, Ted Dunning <te...@gmail.com> wrote:
> So patch it!
>
> On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com> wrote:
>
>> First that it does not start at 0 since it comprises timestamp,
>> workerId and noOfGeneratedIds.
>> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
>> then I risk* that it might overwrite the already existing ID created
>> by it.
>>
>> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> > Uh.... any sequential generator that starts at zero will take a LONG time
>> > until it generates a value > 2^60.
>> >
>> > If you generator a million id's per second (= 2^20) then it will be
>> longer
>> > than 30,000 years before you get past 2^60.
>> >
>> > Is this *really* a problem?
>> >
>> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:
>> >
>> >> Could you recommend any other ID generator that could help me with
>> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>> >>
>> >> Thanks
>> >>
>> >>
>> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
>> >> > Thanks Patrick,
>> >> >
>> >> > I considered your suggestion. But sadly it could not fit my use case.
>> >> > I am looking for a solution that could help me generate 64 bits Ids
>> >> > but in those 64 bits I would like atleast 4 free bits so that I could
>> >> > manage with those free bits to distinguish the type of data for a
>> >> > particular entity in the same columnfamily.
>> >> >
>> >> > If I could keep the snowflake's Id size to around 60 bits, that would
>> >> > have been great..
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org>
>> wrote:
>> >> >> Keep in mind that blog post is pretty old. I see comments like this
>> in
>> >> >> the commit log
>> >> >>
>> >> >> "hard to call it alpha/experimental after serving billions of ids"
>> >> >>
>> >> >> so it seems it's in production at twitter at least...
>> >> >>
>> >> >> Patrick
>> >> >>
>> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com>
>> wrote:
>> >> >>> Thanks Patrick,
>> >> >>>
>> >> >>> The fact that it is still in the alpha stage and twitter is not yet
>> >> >>> using it, makes me look to other solutions as well, which have a
>> large
>> >> >>> community/users base & are more mature.
>> >> >>>
>> >> >>> I do not know much about the snowflake if it is being used in
>> >> >>> production by anyone ..
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
>> >> wrote:
>> >> >>>> Have you looked at snowflake?
>> >> >>>>
>> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>> >> >>>>
>> >> >>>> Patrick
>> >> >>>>
>> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
>> ted.dunning@gmail.com>
>> >> wrote:
>> >> >>>>> If your id's don't need to be exactly sequential or if the
>> generation
>> >> rate
>> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
>> >> >>>>>
>> >> >>>>> To get very high generation rates, what is typically done is to
>> >> allocate
>> >> >>>>> blocks of id's using ZK and then allocate out of the block
>> locally.
>> >>  This
>> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id space
>> and
>> >> it means
>> >> >>>>> that the ordering of id's only approximates the time ordering of
>> when
>> >> the
>> >> >>>>> id's were assigned.  Neither of these is typically a problem.
>> >> >>>>>
>> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
>> >> wrote:
>> >> >>>>>
>> >> >>>>>> Hi all,
>> >> >>>>>>
>> >> >>>>>> I am involved in a project where we're building a social
>> application
>> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
>> generate
>> >> >>>>>> unique sequential IDs for the content on the application. I have
>> >> been
>> >> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
>> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
>> >> suitable
>> >> >>>>>> for this purpose and any good resources to gain information about
>> >> >>>>>> zookeeper.
>> >> >>>>>>
>> >> >>>>>> Since the application is based on a eventually consistent
>> >> distributed
>> >> >>>>>> platform using Cassandra, we have felt a need to look over to
>> other
>> >> >>>>>> solutions instead of building our own using our DB.
>> >> >>>>>>
>> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>> >> >>>>>>
>> >> >>>>>> Regards
>> >> >>>>>> Ertio Lew.
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Ted Dunning <te...@gmail.com>.

So patch it!

On Mon, Feb 28, 2011 at 7:59 AM, Ertio Lew <er...@gmail.com> wrote:

> First that it does not start at 0 since it comprises timestamp,
> workerId and noOfGeneratedIds.
> Thus it is not sequential! Secondly if I insert my 4 bits into this ID
> then I risk* that it might overwrite the already existing ID created
> by it.
>
> On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > Uh.... any sequential generator that starts at zero will take a LONG time
> > until it generates a value > 2^60.
> >
> > If you generator a million id's per second (= 2^20) then it will be
> longer
> > than 30,000 years before you get past 2^60.
> >
> > Is this *really* a problem?
> >
> > On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:
> >
> >> Could you recommend any other ID generator that could help me with
> >> increasing Ids(not necessarily sequential) with size<= 60 bits ?
> >>
> >> Thanks
> >>
> >>
> >> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
> >> > Thanks Patrick,
> >> >
> >> > I considered your suggestion. But sadly it could not fit my use case.
> >> > I am looking for a solution that could help me generate 64 bits Ids
> >> > but in those 64 bits I would like atleast 4 free bits so that I could
> >> > manage with those free bits to distinguish the type of data for a
> >> > particular entity in the same columnfamily.
> >> >
> >> > If I could keep the snowflake's Id size to around 60 bits, that would
> >> > have been great..
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org>
> wrote:
> >> >> Keep in mind that blog post is pretty old. I see comments like this
> in
> >> >> the commit log
> >> >>
> >> >> "hard to call it alpha/experimental after serving billions of ids"
> >> >>
> >> >> so it seems it's in production at twitter at least...
> >> >>
> >> >> Patrick
> >> >>
> >> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com>
> wrote:
> >> >>> Thanks Patrick,
> >> >>>
> >> >>> The fact that it is still in the alpha stage and twitter is not yet
> >> >>> using it, makes me look to other solutions as well, which have a
> large
> >> >>> community/users base & are more mature.
> >> >>>
> >> >>> I do not know much about the snowflake if it is being used in
> >> >>> production by anyone ..
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
> >> wrote:
> >> >>>> Have you looked at snowflake?
> >> >>>>
> >> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
> >> >>>>
> >> >>>> Patrick
> >> >>>>
> >> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <
> ted.dunning@gmail.com>
> >> wrote:
> >> >>>>> If your id's don't need to be exactly sequential or if the
> generation
> >> rate
> >> >>>>> is less than a few thousand per second, ZK is a fine choice.
> >> >>>>>
> >> >>>>> To get very high generation rates, what is typically done is to
> >> allocate
> >> >>>>> blocks of id's using ZK and then allocate out of the block
> locally.
> >>  This
> >> >>>>> can cause you to wind up with a slightly swiss-cheesed id space
> and
> >> it means
> >> >>>>> that the ordering of id's only approximates the time ordering of
> when
> >> the
> >> >>>>> id's were assigned.  Neither of these is typically a problem.
> >> >>>>>
> >> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
> >> wrote:
> >> >>>>>
> >> >>>>>> Hi all,
> >> >>>>>>
> >> >>>>>> I am involved in a project where we're building a social
> application
> >> >>>>>> using Cassandra DB and Java. I am looking for a solution to
> generate
> >> >>>>>> unique sequential IDs for the content on the application. I have
> >> been
> >> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
> >> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
> >> suitable
> >> >>>>>> for this purpose and any good resources to gain information about
> >> >>>>>> zookeeper.
> >> >>>>>>
> >> >>>>>> Since the application is based on a eventually consistent
> >> distributed
> >> >>>>>> platform using Cassandra, we have felt a need to look over to
> other
> >> >>>>>> solutions instead of building our own using our DB.
> >> >>>>>>
> >> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
> >> >>>>>>
> >> >>>>>> Regards
> >> >>>>>> Ertio Lew.
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

First that it does not start at 0 since it comprises timestamp,
workerId and noOfGeneratedIds.
Thus it is not sequential! Secondly if I insert my 4 bits into this ID
then I risk* that it might overwrite the already existing ID created
by it.

On Mon, Feb 28, 2011 at 9:16 PM, Ted Dunning <te...@gmail.com> wrote:
> Uh.... any sequential generator that starts at zero will take a LONG time
> until it generates a value > 2^60.
>
> If you generator a million id's per second (= 2^20) then it will be longer
> than 30,000 years before you get past 2^60.
>
> Is this *really* a problem?
>
> On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:
>
>> Could you recommend any other ID generator that could help me with
>> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>>
>> Thanks
>>
>>
>> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
>> > Thanks Patrick,
>> >
>> > I considered your suggestion. But sadly it could not fit my use case.
>> > I am looking for a solution that could help me generate 64 bits Ids
>> > but in those 64 bits I would like atleast 4 free bits so that I could
>> > manage with those free bits to distinguish the type of data for a
>> > particular entity in the same columnfamily.
>> >
>> > If I could keep the snowflake's Id size to around 60 bits, that would
>> > have been great..
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org> wrote:
>> >> Keep in mind that blog post is pretty old. I see comments like this in
>> >> the commit log
>> >>
>> >> "hard to call it alpha/experimental after serving billions of ids"
>> >>
>> >> so it seems it's in production at twitter at least...
>> >>
>> >> Patrick
>> >>
>> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com> wrote:
>> >>> Thanks Patrick,
>> >>>
>> >>> The fact that it is still in the alpha stage and twitter is not yet
>> >>> using it, makes me look to other solutions as well, which have a large
>> >>> community/users base & are more mature.
>> >>>
>> >>> I do not know much about the snowflake if it is being used in
>> >>> production by anyone ..
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
>> wrote:
>> >>>> Have you looked at snowflake?
>> >>>>
>> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>> >>>>
>> >>>> Patrick
>> >>>>
>> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >>>>> If your id's don't need to be exactly sequential or if the generation
>> rate
>> >>>>> is less than a few thousand per second, ZK is a fine choice.
>> >>>>>
>> >>>>> To get very high generation rates, what is typically done is to
>> allocate
>> >>>>> blocks of id's using ZK and then allocate out of the block locally.
>>  This
>> >>>>> can cause you to wind up with a slightly swiss-cheesed id space and
>> it means
>> >>>>> that the ordering of id's only approximates the time ordering of when
>> the
>> >>>>> id's were assigned.  Neither of these is typically a problem.
>> >>>>>
>> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> I am involved in a project where we're building a social application
>> >>>>>> using Cassandra DB and Java. I am looking for a solution to generate
>> >>>>>> unique sequential IDs for the content on the application. I have
>> been
>> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
>> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
>> suitable
>> >>>>>> for this purpose and any good resources to gain information about
>> >>>>>> zookeeper.
>> >>>>>>
>> >>>>>> Since the application is based on a eventually consistent
>> distributed
>> >>>>>> platform using Cassandra, we have felt a need to look over to other
>> >>>>>> solutions instead of building our own using our DB.
>> >>>>>>
>> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
>> >>>>>>
>> >>>>>> Regards
>> >>>>>> Ertio Lew.
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Ted Dunning <te...@gmail.com>.

Uh.... any sequential generator that starts at zero will take a LONG time
until it generates a value > 2^60.

If you generator a million id's per second (= 2^20) then it will be longer
than 30,000 years before you get past 2^60.

Is this *really* a problem?

On Mon, Feb 28, 2011 at 7:25 AM, Ertio Lew <er...@gmail.com> wrote:

> Could you recommend any other ID generator that could help me with
> increasing Ids(not necessarily sequential) with size<= 60 bits ?
>
> Thanks
>
>
> On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
> > Thanks Patrick,
> >
> > I considered your suggestion. But sadly it could not fit my use case.
> > I am looking for a solution that could help me generate 64 bits Ids
> > but in those 64 bits I would like atleast 4 free bits so that I could
> > manage with those free bits to distinguish the type of data for a
> > particular entity in the same columnfamily.
> >
> > If I could keep the snowflake's Id size to around 60 bits, that would
> > have been great..
> >
> >
> >
> >
> >
> > On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org> wrote:
> >> Keep in mind that blog post is pretty old. I see comments like this in
> >> the commit log
> >>
> >> "hard to call it alpha/experimental after serving billions of ids"
> >>
> >> so it seems it's in production at twitter at least...
> >>
> >> Patrick
> >>
> >> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com> wrote:
> >>> Thanks Patrick,
> >>>
> >>> The fact that it is still in the alpha stage and twitter is not yet
> >>> using it, makes me look to other solutions as well, which have a large
> >>> community/users base & are more mature.
> >>>
> >>> I do not know much about the snowflake if it is being used in
> >>> production by anyone ..
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org>
> wrote:
> >>>> Have you looked at snowflake?
> >>>>
> >>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
> >>>>
> >>>> Patrick
> >>>>
> >>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >>>>> If your id's don't need to be exactly sequential or if the generation
> rate
> >>>>> is less than a few thousand per second, ZK is a fine choice.
> >>>>>
> >>>>> To get very high generation rates, what is typically done is to
> allocate
> >>>>> blocks of id's using ZK and then allocate out of the block locally.
>  This
> >>>>> can cause you to wind up with a slightly swiss-cheesed id space and
> it means
> >>>>> that the ordering of id's only approximates the time ordering of when
> the
> >>>>> id's were assigned.  Neither of these is typically a problem.
> >>>>>
> >>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I am involved in a project where we're building a social application
> >>>>>> using Cassandra DB and Java. I am looking for a solution to generate
> >>>>>> unique sequential IDs for the content on the application. I have
> been
> >>>>>> suggested by some people to have a look  to Zookeeper for this. I
> >>>>>> would highly appreciate if anyone can suggest if zookeeper is
> suitable
> >>>>>> for this purpose and any good resources to gain information about
> >>>>>> zookeeper.
> >>>>>>
> >>>>>> Since the application is based on a eventually consistent
> distributed
> >>>>>> platform using Cassandra, we have felt a need to look over to other
> >>>>>> solutions instead of building our own using our DB.
> >>>>>>
> >>>>>> Any kind of comments, suggestions are highly welcomed! :)
> >>>>>>
> >>>>>> Regards
> >>>>>> Ertio Lew.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Could you recommend any other ID generator that could help me with
increasing Ids(not necessarily sequential) with size<= 60 bits ?

Thanks


On Mon, Feb 28, 2011 at 8:30 PM, Ertio Lew <er...@gmail.com> wrote:
> Thanks Patrick,
>
> I considered your suggestion. But sadly it could not fit my use case.
> I am looking for a solution that could help me generate 64 bits Ids
> but in those 64 bits I would like atleast 4 free bits so that I could
> manage with those free bits to distinguish the type of data for a
> particular entity in the same columnfamily.
>
> If I could keep the snowflake's Id size to around 60 bits, that would
> have been great..
>
>
>
>
>
> On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org> wrote:
>> Keep in mind that blog post is pretty old. I see comments like this in
>> the commit log
>>
>> "hard to call it alpha/experimental after serving billions of ids"
>>
>> so it seems it's in production at twitter at least...
>>
>> Patrick
>>
>> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com> wrote:
>>> Thanks Patrick,
>>>
>>> The fact that it is still in the alpha stage and twitter is not yet
>>> using it, makes me look to other solutions as well, which have a large
>>> community/users base & are more mature.
>>>
>>> I do not know much about the snowflake if it is being used in
>>> production by anyone ..
>>>
>>>
>>>
>>>
>>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org> wrote:
>>>> Have you looked at snowflake?
>>>>
>>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>>
>>>> Patrick
>>>>
>>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com> wrote:
>>>>> If your id's don't need to be exactly sequential or if the generation rate
>>>>> is less than a few thousand per second, ZK is a fine choice.
>>>>>
>>>>> To get very high generation rates, what is typically done is to allocate
>>>>> blocks of id's using ZK and then allocate out of the block locally.  This
>>>>> can cause you to wind up with a slightly swiss-cheesed id space and it means
>>>>> that the ordering of id's only approximates the time ordering of when the
>>>>> id's were assigned.  Neither of these is typically a problem.
>>>>>
>>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am involved in a project where we're building a social application
>>>>>> using Cassandra DB and Java. I am looking for a solution to generate
>>>>>> unique sequential IDs for the content on the application. I have been
>>>>>> suggested by some people to have a look  to Zookeeper for this. I
>>>>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>>>>> for this purpose and any good resources to gain information about
>>>>>> zookeeper.
>>>>>>
>>>>>> Since the application is based on a eventually consistent distributed
>>>>>> platform using Cassandra, we have felt a need to look over to other
>>>>>> solutions instead of building our own using our DB.
>>>>>>
>>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>>>
>>>>>> Regards
>>>>>> Ertio Lew.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Thanks Patrick,

I considered your suggestion. But sadly it could not fit my use case.
I am looking for a solution that could help me generate 64 bits Ids
but in those 64 bits I would like atleast 4 free bits so that I could
manage with those free bits to distinguish the type of data for a
particular entity in the same columnfamily.

If I could keep the snowflake's Id size to around 60 bits, that would
have been great..





On Sat, Feb 26, 2011 at 5:13 AM, Patrick Hunt <ph...@apache.org> wrote:
> Keep in mind that blog post is pretty old. I see comments like this in
> the commit log
>
> "hard to call it alpha/experimental after serving billions of ids"
>
> so it seems it's in production at twitter at least...
>
> Patrick
>
> On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com> wrote:
>> Thanks Patrick,
>>
>> The fact that it is still in the alpha stage and twitter is not yet
>> using it, makes me look to other solutions as well, which have a large
>> community/users base & are more mature.
>>
>> I do not know much about the snowflake if it is being used in
>> production by anyone ..
>>
>>
>>
>>
>> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org> wrote:
>>> Have you looked at snowflake?
>>>
>>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>>
>>> Patrick
>>>
>>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com> wrote:
>>>> If your id's don't need to be exactly sequential or if the generation rate
>>>> is less than a few thousand per second, ZK is a fine choice.
>>>>
>>>> To get very high generation rates, what is typically done is to allocate
>>>> blocks of id's using ZK and then allocate out of the block locally.  This
>>>> can cause you to wind up with a slightly swiss-cheesed id space and it means
>>>> that the ordering of id's only approximates the time ordering of when the
>>>> id's were assigned.  Neither of these is typically a problem.
>>>>
>>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am involved in a project where we're building a social application
>>>>> using Cassandra DB and Java. I am looking for a solution to generate
>>>>> unique sequential IDs for the content on the application. I have been
>>>>> suggested by some people to have a look  to Zookeeper for this. I
>>>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>>>> for this purpose and any good resources to gain information about
>>>>> zookeeper.
>>>>>
>>>>> Since the application is based on a eventually consistent distributed
>>>>> platform using Cassandra, we have felt a need to look over to other
>>>>> solutions instead of building our own using our DB.
>>>>>
>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>>
>>>>> Regards
>>>>> Ertio Lew.
>>>>>
>>>>
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Patrick Hunt <ph...@apache.org>.

Keep in mind that blog post is pretty old. I see comments like this in
the commit log

"hard to call it alpha/experimental after serving billions of ids"

so it seems it's in production at twitter at least...

Patrick

On Fri, Feb 25, 2011 at 2:58 PM, Ertio Lew <er...@gmail.com> wrote:
> Thanks Patrick,
>
> The fact that it is still in the alpha stage and twitter is not yet
> using it, makes me look to other solutions as well, which have a large
> community/users base & are more mature.
>
> I do not know much about the snowflake if it is being used in
> production by anyone ..
>
>
>
>
> On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org> wrote:
>> Have you looked at snowflake?
>>
>> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>>
>> Patrick
>>
>> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com> wrote:
>>> If your id's don't need to be exactly sequential or if the generation rate
>>> is less than a few thousand per second, ZK is a fine choice.
>>>
>>> To get very high generation rates, what is typically done is to allocate
>>> blocks of id's using ZK and then allocate out of the block locally.  This
>>> can cause you to wind up with a slightly swiss-cheesed id space and it means
>>> that the ordering of id's only approximates the time ordering of when the
>>> id's were assigned.  Neither of these is typically a problem.
>>>
>>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am involved in a project where we're building a social application
>>>> using Cassandra DB and Java. I am looking for a solution to generate
>>>> unique sequential IDs for the content on the application. I have been
>>>> suggested by some people to have a look  to Zookeeper for this. I
>>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>>> for this purpose and any good resources to gain information about
>>>> zookeeper.
>>>>
>>>> Since the application is based on a eventually consistent distributed
>>>> platform using Cassandra, we have felt a need to look over to other
>>>> solutions instead of building our own using our DB.
>>>>
>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>
>>>> Regards
>>>> Ertio Lew.
>>>>
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Thanks Patrick,

The fact that it is still in the alpha stage and twitter is not yet
using it, makes me look to other solutions as well, which have a large
community/users base & are more mature.

I do not know much about the snowflake if it is being used in
production by anyone ..




On Fri, Feb 25, 2011 at 11:21 PM, Patrick Hunt <ph...@apache.org> wrote:
> Have you looked at snowflake?
>
> http://engineering.twitter.com/2010/06/announcing-snowflake.html
>
> Patrick
>
> On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com> wrote:
>> If your id's don't need to be exactly sequential or if the generation rate
>> is less than a few thousand per second, ZK is a fine choice.
>>
>> To get very high generation rates, what is typically done is to allocate
>> blocks of id's using ZK and then allocate out of the block locally.  This
>> can cause you to wind up with a slightly swiss-cheesed id space and it means
>> that the ordering of id's only approximates the time ordering of when the
>> id's were assigned.  Neither of these is typically a problem.
>>
>> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am involved in a project where we're building a social application
>>> using Cassandra DB and Java. I am looking for a solution to generate
>>> unique sequential IDs for the content on the application. I have been
>>> suggested by some people to have a look  to Zookeeper for this. I
>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>> for this purpose and any good resources to gain information about
>>> zookeeper.
>>>
>>> Since the application is based on a eventually consistent distributed
>>> platform using Cassandra, we have felt a need to look over to other
>>> solutions instead of building our own using our DB.
>>>
>>> Any kind of comments, suggestions are highly welcomed! :)
>>>
>>> Regards
>>> Ertio Lew.
>>>
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Patrick Hunt <ph...@apache.org>.

Have you looked at snowflake?

http://engineering.twitter.com/2010/06/announcing-snowflake.html

Patrick

On Fri, Feb 25, 2011 at 9:43 AM, Ted Dunning <te...@gmail.com> wrote:
> If your id's don't need to be exactly sequential or if the generation rate
> is less than a few thousand per second, ZK is a fine choice.
>
> To get very high generation rates, what is typically done is to allocate
> blocks of id's using ZK and then allocate out of the block locally.  This
> can cause you to wind up with a slightly swiss-cheesed id space and it means
> that the ordering of id's only approximates the time ordering of when the
> id's were assigned.  Neither of these is typically a problem.
>
> On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am involved in a project where we're building a social application
>> using Cassandra DB and Java. I am looking for a solution to generate
>> unique sequential IDs for the content on the application. I have been
>> suggested by some people to have a look  to Zookeeper for this. I
>> would highly appreciate if anyone can suggest if zookeeper is suitable
>> for this purpose and any good resources to gain information about
>> zookeeper.
>>
>> Since the application is based on a eventually consistent distributed
>> platform using Cassandra, we have felt a need to look over to other
>> solutions instead of building our own using our DB.
>>
>> Any kind of comments, suggestions are highly welcomed! :)
>>
>> Regards
>> Ertio Lew.
>>
>

Re: Zookeeper for generating sequential IDs

Posted by Ted Dunning <te...@gmail.com>.

If your id's don't need to be exactly sequential or if the generation rate
is less than a few thousand per second, ZK is a fine choice.

To get very high generation rates, what is typically done is to allocate
blocks of id's using ZK and then allocate out of the block locally.  This
can cause you to wind up with a slightly swiss-cheesed id space and it means
that the ordering of id's only approximates the time ordering of when the
id's were assigned.  Neither of these is typically a problem.

On Fri, Feb 25, 2011 at 1:50 AM, Ertio Lew <er...@gmail.com> wrote:

> Hi all,
>
> I am involved in a project where we're building a social application
> using Cassandra DB and Java. I am looking for a solution to generate
> unique sequential IDs for the content on the application. I have been
> suggested by some people to have a look  to Zookeeper for this. I
> would highly appreciate if anyone can suggest if zookeeper is suitable
> for this purpose and any good resources to gain information about
> zookeeper.
>
> Since the application is based on a eventually consistent distributed
> platform using Cassandra, we have felt a need to look over to other
> solutions instead of building our own using our DB.
>
> Any kind of comments, suggestions are highly welcomed! :)
>
> Regards
> Ertio Lew.
>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

We also wanted a facility for 4 byte integers but still this may be of
interest to us.

Thanks
Ertio Lew

On Wed, Mar 9, 2011 at 10:17 AM, David Rosenstrauch <da...@darose.net> wrote:
> The library currently generates ID's as java longs.  (i.e., 8 byte
> integers).  Does that work for you?
>
> I keep trying to free up time to release this, but I keep getting buried at
> work!  :-(  Will try my best to get this out soon.
>
> HTH,
>
> DR
>
> On 03/08/2011 01:21 PM, Ertio Lew wrote:
>>
>> Thanks so much David !!
>>
>> Your solution seems to perfectly fulfill our requirements of
>> continuous and monotonically  increasing Ids. What is the size of your
>> Ids in bytes??
>>
>> We are particularly looking for i32 and i64 sized ids.
>> Are you planning to release this work to community anytime sooner ?
>>
>> Thanks anyways for sharing knowledge.
>>
>>
>> On Tue, Mar 8, 2011 at 11:43 PM, David Rosenstrauch<da...@darose.net>
>>  wrote:
>>>
>>> On 03/08/2011 01:09 PM, David Rosenstrauch wrote:
>>>>
>>>> On 02/25/2011 04:50 AM, Ertio Lew wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am involved in a project where we're building a social application
>>>>> using Cassandra DB and Java. I am looking for a solution to generate
>>>>> unique sequential IDs for the content on the application. I have been
>>>>> suggested by some people to have a look to Zookeeper for this. I
>>>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>>>> for this purpose and any good resources to gain information about
>>>>> zookeeper.
>>>>>
>>>>> Since the application is based on a eventually consistent distributed
>>>>> platform using Cassandra, we have felt a need to look over to other
>>>>> solutions instead of building our own using our DB.
>>>>>
>>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>>
>>>>> Regards
>>>>> Ertio Lew.
>>>>
>>>> I ran into a similar id-generation issue, and wrote a library for it.
>>>> (Details described in this msg:
>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%3C4C5B7656.4020200@darose.net%3E
>>>> .)
>>>>
>>>> Been planning to release it to the community, but haven't gotten around
>>>> to it yet.
>>>>
>>>> Not sure my solution is exactly what you're looking for though.
>>>>
>>>> HTH,
>>>>
>>>> DR
>>>
>>> BTW, that email is old.  We now have had this running quite reliably in
>>> production for several months now.  It's being used by M/R jobs running
>>> 100
>>> simultaneous reducers, each accessing the ID generator, and assigning
>>> nearly
>>> 1 million ID's per job in total.
>>>
>>> DR
>>>
>
>

Re: Zookeeper for generating sequential IDs

Posted by David Rosenstrauch <da...@darose.net>.

The library currently generates ID's as java longs.  (i.e., 8 byte 
integers).  Does that work for you?

I keep trying to free up time to release this, but I keep getting buried 
at work!  :-(  Will try my best to get this out soon.

HTH,

DR

On 03/08/2011 01:21 PM, Ertio Lew wrote:
> Thanks so much David !!
>
> Your solution seems to perfectly fulfill our requirements of
> continuous and monotonically  increasing Ids. What is the size of your
> Ids in bytes??
>
> We are particularly looking for i32 and i64 sized ids.
> Are you planning to release this work to community anytime sooner ?
>
> Thanks anyways for sharing knowledge.
>
>
> On Tue, Mar 8, 2011 at 11:43 PM, David Rosenstrauch<da...@darose.net>  wrote:
>> On 03/08/2011 01:09 PM, David Rosenstrauch wrote:
>>>
>>> On 02/25/2011 04:50 AM, Ertio Lew wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I am involved in a project where we're building a social application
>>>> using Cassandra DB and Java. I am looking for a solution to generate
>>>> unique sequential IDs for the content on the application. I have been
>>>> suggested by some people to have a look to Zookeeper for this. I
>>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>>> for this purpose and any good resources to gain information about
>>>> zookeeper.
>>>>
>>>> Since the application is based on a eventually consistent distributed
>>>> platform using Cassandra, we have felt a need to look over to other
>>>> solutions instead of building our own using our DB.
>>>>
>>>> Any kind of comments, suggestions are highly welcomed! :)
>>>>
>>>> Regards
>>>> Ertio Lew.
>>>
>>> I ran into a similar id-generation issue, and wrote a library for it.
>>> (Details described in this msg:
>>>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%3C4C5B7656.4020200@darose.net%3E
>>> .)
>>>
>>> Been planning to release it to the community, but haven't gotten around
>>> to it yet.
>>>
>>> Not sure my solution is exactly what you're looking for though.
>>>
>>> HTH,
>>>
>>> DR
>>
>> BTW, that email is old.  We now have had this running quite reliably in
>> production for several months now.  It's being used by M/R jobs running 100
>> simultaneous reducers, each accessing the ID generator, and assigning nearly
>> 1 million ID's per job in total.
>>
>> DR
>>

Re: Zookeeper for generating sequential IDs

Posted by Ertio Lew <er...@gmail.com>.

Thanks so much David !!

Your solution seems to perfectly fulfill our requirements of
continuous and monotonically  increasing Ids. What is the size of your
Ids in bytes??

We are particularly looking for i32 and i64 sized ids.
Are you planning to release this work to community anytime sooner ?

Thanks anyways for sharing knowledge.


On Tue, Mar 8, 2011 at 11:43 PM, David Rosenstrauch <da...@darose.net> wrote:
> On 03/08/2011 01:09 PM, David Rosenstrauch wrote:
>>
>> On 02/25/2011 04:50 AM, Ertio Lew wrote:
>>>
>>> Hi all,
>>>
>>> I am involved in a project where we're building a social application
>>> using Cassandra DB and Java. I am looking for a solution to generate
>>> unique sequential IDs for the content on the application. I have been
>>> suggested by some people to have a look to Zookeeper for this. I
>>> would highly appreciate if anyone can suggest if zookeeper is suitable
>>> for this purpose and any good resources to gain information about
>>> zookeeper.
>>>
>>> Since the application is based on a eventually consistent distributed
>>> platform using Cassandra, we have felt a need to look over to other
>>> solutions instead of building our own using our DB.
>>>
>>> Any kind of comments, suggestions are highly welcomed! :)
>>>
>>> Regards
>>> Ertio Lew.
>>
>> I ran into a similar id-generation issue, and wrote a library for it.
>> (Details described in this msg:
>>
>> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%3C4C5B7656.4020200@darose.net%3E
>> .)
>>
>> Been planning to release it to the community, but haven't gotten around
>> to it yet.
>>
>> Not sure my solution is exactly what you're looking for though.
>>
>> HTH,
>>
>> DR
>
> BTW, that email is old.  We now have had this running quite reliably in
> production for several months now.  It's being used by M/R jobs running 100
> simultaneous reducers, each accessing the ID generator, and assigning nearly
> 1 million ID's per job in total.
>
> DR
>

Re: Zookeeper for generating sequential IDs

Posted by David Rosenstrauch <da...@darose.net>.

On 03/08/2011 01:09 PM, David Rosenstrauch wrote:
> On 02/25/2011 04:50 AM, Ertio Lew wrote:
>> Hi all,
>>
>> I am involved in a project where we're building a social application
>> using Cassandra DB and Java. I am looking for a solution to generate
>> unique sequential IDs for the content on the application. I have been
>> suggested by some people to have a look to Zookeeper for this. I
>> would highly appreciate if anyone can suggest if zookeeper is suitable
>> for this purpose and any good resources to gain information about
>> zookeeper.
>>
>> Since the application is based on a eventually consistent distributed
>> platform using Cassandra, we have felt a need to look over to other
>> solutions instead of building our own using our DB.
>>
>> Any kind of comments, suggestions are highly welcomed! :)
>>
>> Regards
>> Ertio Lew.
>
> I ran into a similar id-generation issue, and wrote a library for it.
> (Details described in this msg:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%3C4C5B7656.4020200@darose.net%3E
> .)
>
> Been planning to release it to the community, but haven't gotten around
> to it yet.
>
> Not sure my solution is exactly what you're looking for though.
>
> HTH,
>
> DR

BTW, that email is old.  We now have had this running quite reliably in 
production for several months now.  It's being used by M/R jobs running 
100 simultaneous reducers, each accessing the ID generator, and 
assigning nearly 1 million ID's per job in total.

DR

Re: Zookeeper for generating sequential IDs

Posted by David Rosenstrauch <da...@darose.net>.

On 02/25/2011 04:50 AM, Ertio Lew wrote:
> Hi all,
>
> I am involved in a project where we're building a social application
> using Cassandra DB and Java. I am looking for a solution to generate
> unique sequential IDs for the content on the application. I have been
> suggested by some people to have a look  to Zookeeper for this. I
> would highly appreciate if anyone can suggest if zookeeper is suitable
> for this purpose and any good resources to gain information about
> zookeeper.
>
> Since the application is based on a eventually consistent distributed
> platform using Cassandra, we have felt a need to look over to other
> solutions instead of building our own using our DB.
>
> Any kind of comments, suggestions are highly welcomed! :)
>
> Regards
> Ertio Lew.

I ran into a similar id-generation issue, and wrote a library for it. 
(Details described in this msg: 
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201008.mbox/%3C4C5B7656.4020200@darose.net%3E 
.)

Been planning to release it to the community, but haven't gotten around 
to it yet.

Not sure my solution is exactly what you're looking for though.

HTH,

DR