You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David Boxenhorn <da...@lookin2.com> on 2010/04/29 15:16:57 UTC

Basic Architecture Question

We want to store objects in Cassandra. In general, the mapping is quite
easy. But for some kinds of objects, we want to be able to read all of them
into memory.

We want to use random partitioning, which means that we can't do a range
query over keys (is this right?). Is there any way to get ALL the keys
directly (order is not important)? Or do I need to define a separate column
family, and save all the keys in a single column of that family? Or...
something else?

Re: How does cassandra deal with collisions?

Posted by Sylvain Lebresne <sy...@yakaz.com>.
Two rows are never compared by the MD5 of their keys. The md5 of a row key is
just used to choose which nodes of the cluster are responsible for the row.

On Fri, Apr 30, 2010 at 5:37 AM, Mark Jones <MJ...@imagehawk.com> wrote:
> MD5 is not a perfect hash, it can produce collisions, how are these dealt
> with?
>
> Is there a size appended to them?
>
>
>
> If 2 keys collide, would that result in a merging of data (if the column
> names aren’t the same) or an overwrite if they were?

How does cassandra deal with collisions?

Posted by Mark Jones <MJ...@imagehawk.com>.
MD5 is not a perfect hash, it can produce collisions, how are these dealt with?
Is there a size appended to them?

If 2 keys collide, would that result in a merging of data (if the column names aren't the same) or an overwrite if they were?

Re: Basic Architecture Question

Posted by Jonathan Ellis <jb...@gmail.com>.
It will.

2010/5/1 Patricio Echagüe <pa...@gmail.com>:
> Roger, if you include the last read key as the start key for the next API
> call, will that retrieve the same key/row twice?
> The documentation says that both keys (start, finish) are included.
> Thanks
>
> On Thu, Apr 29, 2010 at 1:31 PM, Brandon Williams <dr...@gmail.com> wrote:
>>
>> On Thu, Apr 29, 2010 at 10:19 AM, David Boxenhorn <da...@lookin2.com>
>> wrote:
>>>
>>> So now we can do any kind of range queries, not just "for getting all
>>> keys" as Jesse said?
>>
>> With RP, the key ranges are based on the MD5 sum of the key, so it's
>> really only useful for getting all keys, or obtaining a semi-random row.
>> -Brandon
>
>
> --
> Patricio.-
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Basic Architecture Question

Posted by Patricio Echagüe <pa...@gmail.com>.
Roger, if you include the last read key as the start key for the next API
call, will that retrieve the same key/row twice?

The documentation says that both keys (start, finish) are included.
Thanks

On Thu, Apr 29, 2010 at 1:31 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Thu, Apr 29, 2010 at 10:19 AM, David Boxenhorn <da...@lookin2.com>wrote:
>
>> So now we can do any kind of range queries, not just "for getting all
>> keys" as Jesse said?
>>
>
> With RP, the key ranges are based on the MD5 sum of the key, so it's really
> only useful for getting all keys, or obtaining a semi-random row.
>
> -Brandon
>



-- 
Patricio.-

Re: Basic Architecture Question

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, Apr 29, 2010 at 10:19 AM, David Boxenhorn <da...@lookin2.com> wrote:

> So now we can do any kind of range queries, not just "for getting all keys"
> as Jesse said?
>

With RP, the key ranges are based on the MD5 sum of the key, so it's really
only useful for getting all keys, or obtaining a semi-random row.

-Brandon

Re: Basic Architecture Question

Posted by David Boxenhorn <da...@lookin2.com>.
So now we can do any kind of range queries, not just "for getting all keys"
as Jesse said?

On Thu, Apr 29, 2010 at 6:04 PM, Roger Schildmeijer
<sc...@gmail.com>wrote:

> take a look at get_range_slices and start with "".
> then invoke get_range_slices again, but this time use the last key as the
> start key
>
> // Roger Schildmeijer
>
> On 29 apr 2010, at 16.28em, David Boxenhorn wrote:
>
> How do I do that???
>
> On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell <
> jesse.mcconnell@gmail.com> wrote:
>
>> apparently there is now range query support for getting all keys using the
>> RP...
>>
>> cheers,
>> jesse
>>
>> --
>> jesse mcconnell
>> jesse.mcconnell@gmail.com
>>
>>
>>
>> On Thu, Apr 29, 2010 at 08:16, David Boxenhorn <da...@lookin2.com> wrote:
>> > We want to store objects in Cassandra. In general, the mapping is quite
>> > easy. But for some kinds of objects, we want to be able to read all of
>> them
>> > into memory.
>> >
>> > We want to use random partitioning, which means that we can't do a range
>> > query over keys (is this right?). Is there any way to get ALL the keys
>> > directly (order is not important)? Or do I need to define a separate
>> column
>> > family, and save all the keys in a single column of that family? Or...
>> > something else?
>> >
>>
>
>
>

Re: Basic Architecture Question

Posted by Roger Schildmeijer <sc...@gmail.com>.
take a look at get_range_slices and start with "". 
then invoke get_range_slices again, but this time use the last key as the start key

// Roger Schildmeijer

On 29 apr 2010, at 16.28em, David Boxenhorn wrote:

> How do I do that??? 
> 
> On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell <je...@gmail.com> wrote:
> apparently there is now range query support for getting all keys using the RP...
> 
> cheers,
> jesse
> 
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
> 
> 
> 
> On Thu, Apr 29, 2010 at 08:16, David Boxenhorn <da...@lookin2.com> wrote:
> > We want to store objects in Cassandra. In general, the mapping is quite
> > easy. But for some kinds of objects, we want to be able to read all of them
> > into memory.
> >
> > We want to use random partitioning, which means that we can't do a range
> > query over keys (is this right?). Is there any way to get ALL the keys
> > directly (order is not important)? Or do I need to define a separate column
> > family, and save all the keys in a single column of that family? Or...
> > something else?
> >
> 


Re: Basic Architecture Question

Posted by David Boxenhorn <da...@lookin2.com>.
How do I do that???

On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell
<je...@gmail.com>wrote:

> apparently there is now range query support for getting all keys using the
> RP...
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Thu, Apr 29, 2010 at 08:16, David Boxenhorn <da...@lookin2.com> wrote:
> > We want to store objects in Cassandra. In general, the mapping is quite
> > easy. But for some kinds of objects, we want to be able to read all of
> them
> > into memory.
> >
> > We want to use random partitioning, which means that we can't do a range
> > query over keys (is this right?). Is there any way to get ALL the keys
> > directly (order is not important)? Or do I need to define a separate
> column
> > family, and save all the keys in a single column of that family? Or...
> > something else?
> >
>

Re: Basic Architecture Question

Posted by Jesse McConnell <je...@gmail.com>.
apparently there is now range query support for getting all keys using the RP...

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Thu, Apr 29, 2010 at 08:16, David Boxenhorn <da...@lookin2.com> wrote:
> We want to store objects in Cassandra. In general, the mapping is quite
> easy. But for some kinds of objects, we want to be able to read all of them
> into memory.
>
> We want to use random partitioning, which means that we can't do a range
> query over keys (is this right?). Is there any way to get ALL the keys
> directly (order is not important)? Or do I need to define a separate column
> family, and save all the keys in a single column of that family? Or...
> something else?
>