You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeremy Davis <je...@gmail.com> on 2010/04/02 03:32:39 UTC

Creating a Total Ordered Queue in Cassandra

I'm in the process of implementing a Totally Ordered Queue in Cassandra, and
wanted to bounce my ideas off the list and also see if there are any other
suggestions.

I've come up with an external source of ID's that are always increasing (but
not monotonic), and I've also used external synchronization to ensure only
one writer to a given queue. And I handle de-duping in the app.


My current solution is : (simplified)

Use the "QueueId", to Key into a row of a CF.
Then, every column in that CF corresponds to a new entry in the Queue, with
a custom Comparator to sort the columns by my external ID that is always
increasing.

Technically I never delete data from the Queue, and I just page through it
from a given ID using a SliceRange, etc.

Obviously the problem being that the row needs to get compacted. so then I
started bucketizing with multiple rows for a given queue (for example one
per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)

Does this seem reasonable? It's solvable, but is starting to seem
complicated to implement... It would be very easy if I didn't have to have
multiple buckets..



My other thought is to store one entry per row, and perform get_range_slices
and specify a KeyRange, with the OrderPreservingPartitioner.
But it isn't exactly clear to me what the Order of the keys are in this
system, so I don't know how to construct my key and queries appropriately...
Is this Lexical String Order? Or?

So for example.. Assuming my QueueId's are longs, and my ID's are also
longs.. My key would be (in Java):

long queueId;
long msgId;

key = "" + queueId + ":" + msgId;

And if I wanted to do a query my key range might be from
start = "" + queueId + ":0"
end = "" + queueId + ":" + Long.MAX_VALUE;

(Will I have to left pad the msgIds with 0's)?

And is this going to be efficient if my msgId isn't monotonically
increasing?

Thanks,
-JD

Re: Creating a Total Ordered Queue in Cassandra

Posted by Jeremy Davis <je...@gmail.com>.
Since twitter is everyone's favorite analogy:
It's like twitter, but faster and with bigger messages that I may need to go
back and replay in order to mine for more details at a later date.
Thus, I call it a queue, because the order of messages is important.. But
not anything like a message broker/pub-sub/topic/ etc...

-JD



On Thu, Apr 1, 2010 at 9:43 PM, Jeremy Davis
<je...@gmail.com>wrote:

>
> You are correct, it is not a queue in the classic sense... I'm storing the
> entire "conversation" with a client in perpetuity, and then playing it back
> in the order received.
>
> Rabbitmq/activemq etc all have about the same throughput 3-6K persistent
> messages/sec, and are not good for storing the conversation forever... Also
> I can easily scale cassandra past that message rate and not have to worry
> about which message broker/cluster I'm connecting to/has the
> conversation/etc.
>
>
>
>
> On Thu, Apr 1, 2010 at 7:02 PM, Keith Thornhill <ke...@raptr.com> wrote:
>
>> you mention never deleting from the queue, so what purpose is this
>> serving? (if you don't pop off the front, is it really a queue?)
>>
>> seems if guaranteed order of messages is required, there are many
>> other projects which are focused towards that problem (rabbitmq,
>> kestrel, activemq, etc)
>>
>> or am i misunderstanding your needs here?
>>
>> -keith
>>
>> On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis
>> <je...@gmail.com> wrote:
>> > I'm in the process of implementing a Totally Ordered Queue in Cassandra,
>> and
>> > wanted to bounce my ideas off the list and also see if there are any
>> other
>> > suggestions.
>> >
>> > I've come up with an external source of ID's that are always increasing
>> (but
>> > not monotonic), and I've also used external synchronization to ensure
>> only
>> > one writer to a given queue. And I handle de-duping in the app.
>> >
>> >
>> > My current solution is : (simplified)
>> >
>> > Use the "QueueId", to Key into a row of a CF.
>> > Then, every column in that CF corresponds to a new entry in the Queue,
>> with
>> > a custom Comparator to sort the columns by my external ID that is always
>> > increasing.
>> >
>> > Technically I never delete data from the Queue, and I just page through
>> it
>> > from a given ID using a SliceRange, etc.
>> >
>> > Obviously the problem being that the row needs to get compacted. so then
>> I
>> > started bucketizing with multiple rows for a given queue (for example
>> one
>> > per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)
>> >
>> > Does this seem reasonable? It's solvable, but is starting to seem
>> > complicated to implement... It would be very easy if I didn't have to
>> have
>> > multiple buckets..
>> >
>> >
>> >
>> > My other thought is to store one entry per row, and perform
>> get_range_slices
>> > and specify a KeyRange, with the OrderPreservingPartitioner.
>> > But it isn't exactly clear to me what the Order of the keys are in this
>> > system, so I don't know how to construct my key and queries
>> appropriately...
>> > Is this Lexical String Order? Or?
>> >
>> > So for example.. Assuming my QueueId's are longs, and my ID's are also
>> > longs.. My key would be (in Java):
>> >
>> > long queueId;
>> > long msgId;
>> >
>> > key = "" + queueId + ":" + msgId;
>> >
>> > And if I wanted to do a query my key range might be from
>> > start = "" + queueId + ":0"
>> > end = "" + queueId + ":" + Long.MAX_VALUE;
>> >
>> > (Will I have to left pad the msgIds with 0's)?
>> >
>> > And is this going to be efficient if my msgId isn't monotonically
>> > increasing?
>> >
>> > Thanks,
>> > -JD
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: Creating a Total Ordered Queue in Cassandra

Posted by Tatu Saloranta <ts...@gmail.com>.
On Thu, Apr 1, 2010 at 9:43 PM, Jeremy Davis
<je...@gmail.com> wrote:
>
> You are correct, it is not a queue in the classic sense... I'm storing the
> entire "conversation" with a client in perpetuity, and then playing it back
> in the order received.
>
> Rabbitmq/activemq etc all have about the same throughput 3-6K persistent
> messages/sec, and are not good for storing the conversation forever... Also
> I can easily scale cassandra past that message rate and not have to worry
> about which message broker/cluster I'm connecting to/has the
> conversation/etc.

Also: I think RabbitMQ specifically does not have distributed message
stores -- each message lives in just one queue node, meaning that when
it is down (or gets wiped out), so are messages for that particular
queue. Otherwise it seems like a really nice queuing system.
The other potential concern is that all message metadata for it has to
fit in central memory (message payload can be persisted I think) of
the host that owns message.
So while RabbitMQ and ActiveMQ are obviously better matches for
queuing (with very powerful semantics, optional transactionality, etc.
etc. etc.)  Cassandra seems to have better distribution and
fault-tolerance properties. This could be useful for some scenarios.
In fact I wonder if "traditional" MQs could be considered quite a bit
like RDBMSs regarding scalability, regarding distribution, horizontal
scaling (or lack thereof) by adding nodes, and cost of ACID features
(high expresive power vs simple scalability).

I am actually also interested in similar aspects; using queue name and
sequence identifier for implementing queue-like constructs, and was
happy to see this question. But in my case, I would want to eventually
also delete messages, so I would not have to rely as much on
monotonically increasing ids aspect. This would allow
many-senders-single-receiver use case, with little or no external
synchronization.

-+ Tatu +-

Re: Creating a Total Ordered Queue in Cassandra

Posted by Jeremy Davis <je...@gmail.com>.
You are correct, it is not a queue in the classic sense... I'm storing the
entire "conversation" with a client in perpetuity, and then playing it back
in the order received.

Rabbitmq/activemq etc all have about the same throughput 3-6K persistent
messages/sec, and are not good for storing the conversation forever... Also
I can easily scale cassandra past that message rate and not have to worry
about which message broker/cluster I'm connecting to/has the
conversation/etc.



On Thu, Apr 1, 2010 at 7:02 PM, Keith Thornhill <ke...@raptr.com> wrote:

> you mention never deleting from the queue, so what purpose is this
> serving? (if you don't pop off the front, is it really a queue?)
>
> seems if guaranteed order of messages is required, there are many
> other projects which are focused towards that problem (rabbitmq,
> kestrel, activemq, etc)
>
> or am i misunderstanding your needs here?
>
> -keith
>
> On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis
> <je...@gmail.com> wrote:
> > I'm in the process of implementing a Totally Ordered Queue in Cassandra,
> and
> > wanted to bounce my ideas off the list and also see if there are any
> other
> > suggestions.
> >
> > I've come up with an external source of ID's that are always increasing
> (but
> > not monotonic), and I've also used external synchronization to ensure
> only
> > one writer to a given queue. And I handle de-duping in the app.
> >
> >
> > My current solution is : (simplified)
> >
> > Use the "QueueId", to Key into a row of a CF.
> > Then, every column in that CF corresponds to a new entry in the Queue,
> with
> > a custom Comparator to sort the columns by my external ID that is always
> > increasing.
> >
> > Technically I never delete data from the Queue, and I just page through
> it
> > from a given ID using a SliceRange, etc.
> >
> > Obviously the problem being that the row needs to get compacted. so then
> I
> > started bucketizing with multiple rows for a given queue (for example one
> > per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)
> >
> > Does this seem reasonable? It's solvable, but is starting to seem
> > complicated to implement... It would be very easy if I didn't have to
> have
> > multiple buckets..
> >
> >
> >
> > My other thought is to store one entry per row, and perform
> get_range_slices
> > and specify a KeyRange, with the OrderPreservingPartitioner.
> > But it isn't exactly clear to me what the Order of the keys are in this
> > system, so I don't know how to construct my key and queries
> appropriately...
> > Is this Lexical String Order? Or?
> >
> > So for example.. Assuming my QueueId's are longs, and my ID's are also
> > longs.. My key would be (in Java):
> >
> > long queueId;
> > long msgId;
> >
> > key = "" + queueId + ":" + msgId;
> >
> > And if I wanted to do a query my key range might be from
> > start = "" + queueId + ":0"
> > end = "" + queueId + ":" + Long.MAX_VALUE;
> >
> > (Will I have to left pad the msgIds with 0's)?
> >
> > And is this going to be efficient if my msgId isn't monotonically
> > increasing?
> >
> > Thanks,
> > -JD
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: Creating a Total Ordered Queue in Cassandra

Posted by Keith Thornhill <ke...@raptr.com>.
you mention never deleting from the queue, so what purpose is this
serving? (if you don't pop off the front, is it really a queue?)

seems if guaranteed order of messages is required, there are many
other projects which are focused towards that problem (rabbitmq,
kestrel, activemq, etc)

or am i misunderstanding your needs here?

-keith

On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis
<je...@gmail.com> wrote:
> I'm in the process of implementing a Totally Ordered Queue in Cassandra, and
> wanted to bounce my ideas off the list and also see if there are any other
> suggestions.
>
> I've come up with an external source of ID's that are always increasing (but
> not monotonic), and I've also used external synchronization to ensure only
> one writer to a given queue. And I handle de-duping in the app.
>
>
> My current solution is : (simplified)
>
> Use the "QueueId", to Key into a row of a CF.
> Then, every column in that CF corresponds to a new entry in the Queue, with
> a custom Comparator to sort the columns by my external ID that is always
> increasing.
>
> Technically I never delete data from the Queue, and I just page through it
> from a given ID using a SliceRange, etc.
>
> Obviously the problem being that the row needs to get compacted. so then I
> started bucketizing with multiple rows for a given queue (for example one
> per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)
>
> Does this seem reasonable? It's solvable, but is starting to seem
> complicated to implement... It would be very easy if I didn't have to have
> multiple buckets..
>
>
>
> My other thought is to store one entry per row, and perform get_range_slices
> and specify a KeyRange, with the OrderPreservingPartitioner.
> But it isn't exactly clear to me what the Order of the keys are in this
> system, so I don't know how to construct my key and queries appropriately...
> Is this Lexical String Order? Or?
>
> So for example.. Assuming my QueueId's are longs, and my ID's are also
> longs.. My key would be (in Java):
>
> long queueId;
> long msgId;
>
> key = "" + queueId + ":" + msgId;
>
> And if I wanted to do a query my key range might be from
> start = "" + queueId + ":0"
> end = "" + queueId + ":" + Long.MAX_VALUE;
>
> (Will I have to left pad the msgIds with 0's)?
>
> And is this going to be efficient if my msgId isn't monotonically
> increasing?
>
> Thanks,
> -JD
>
>
>
>
>
>
>
>
>
>
>
>
>