You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Thorsten von Eicken <tv...@rightscale.com> on 2009/06/04 08:33:33 UTC

questions about operations

I'm looking at the cassandra data model and operations and I'm running 
into a number of questions I have not been able to answer:

- what does get_columns_since do? I thought there's only one version of 
a column stored. I'm puzzled about the "since" aspect.

- is the Thrift interface for get_superColumn correct? It seems to me 
that "3:string columnFamily" should really be "3:string 
columnFamily_superColumnName" (I know this doesn't have any functional 
impact, just makes it hard to understand what the operation does)

- is the Thrift interface for get_slice_super correct? It seems to me 
that "3:string columnFamily_superColumnName" should really be "3:string 
columnFamily"

- what does get_key_range do? It looks like it returns a list of keys, 
but why does one have to specify a list of column family names?

- what does touch do?

Thanks much!
Thorsten - CTO RightScale

Re: questions about operations

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Jun 4, 2009 at 10:01 AM, Thorsten von Eicken <tv...@rightscale.com> wrote:
> Ah, got it, I forgot about the time-sorted CFs. So does this mean that if I
> call get_columns_since on a name-sorted CF I will get an invalid request
> exception? And also if I call get_slice_by_name_range or get_slice_by_names
> on a time-sorted CF? Or does the sorting only affect performance and not
> whether the operations are allowed or not?

My best guess from looking at the code (I haven't tested it) is that
it will try to fulfil the request on the "wrong" kind of CF, but I
don't think it actually handles that case correctly.

If you could verify that there is a bug here and file a JIRA ticket if
so, that would be helpful. :)

> Also, is there no get_slice_super_since and get_slice_super_by_name_range?

Right -- currently supercolumns are always name-sorted, and their
subcolumns are always time-sorted.

-Jonathan

Re: questions about operations

Posted by Thorsten von Eicken <tv...@rightscale.com>.

Jonathan Ellis wrote:
> On Thu, Jun 4, 2009 at 12:33 AM, Thorsten von Eicken <tv...@rightscale.com> wrote:
>   
>> I'm looking at the cassandra data model and operations and I'm running into
>> a number of questions I have not been able to answer:
>>
>> - what does get_columns_since do? I thought there's only one version of a
>> column stored. I'm puzzled about the "since" aspect.
>>     
>
> this is for use with time-sorted CFs or supercolumns -- it's like a
> slice by time.
>   
Ah, got it, I forgot about the time-sorted CFs. So does this mean that 
if I call get_columns_since on a name-sorted CF I will get an invalid 
request exception? And also if I call get_slice_by_name_range or 
get_slice_by_names on a time-sorted CF? Or does the sorting only affect 
performance and not whether the operations are allowed or not?

Also, is there no get_slice_super_since and get_slice_super_by_name_range?

Jonathan and Mark: thanks for the clarifications!
Thorsten

Re: questions about operations

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Jun 4, 2009 at 12:33 AM, Thorsten von Eicken <tv...@rightscale.com> wrote:
> I'm looking at the cassandra data model and operations and I'm running into
> a number of questions I have not been able to answer:
>
> - what does get_columns_since do? I thought there's only one version of a
> column stored. I'm puzzled about the "since" aspect.

this is for use with time-sorted CFs or supercolumns -- it's like a
slice by time.

> - is the Thrift interface for get_superColumn correct? It seems to me that
> "3:string columnFamily" should really be "3:string
> columnFamily_superColumnName" (I know this doesn't have any functional
> impact, just makes it hard to understand what the operation does)
>
> - is the Thrift interface for get_slice_super correct? It seems to me that
> "3:string columnFamily_superColumnName" should really be "3:string
> columnFamily"

I think you're right.

> - what does get_key_range do? It looks like it returns a list of keys, but
> why does one have to specify a list of column family names?

The CF is the unit of data storage, so it will be more efficient if
you can narrow down which CFs you are interested in keys from.  But if
you pass an empty list it will scan all of them.

> - what does touch do?

It's intended to force the index information for the key in question
into an explicit LRU cache to save a seek on the next lookup, and also
get the row data into the OS fs cache.  But the first part is buggy
and the second part works poorly with large rows so it's going to be
removed in trunk RSN.

-Jonathan

Re: questions about operations

Posted by Mark Robson <ma...@gmail.com>.

>
> - what does get_key_range do? It looks like it returns a list of keys, but
> why does one have to specify a list of column family names?


It returns a list of keys which exist.

In my experiments, I think that a key "existing" is defined as having at
least one column in one column family that exists (i.e. has not been
deleted). Each column needs to be deleted individually for a key as there
isn't a bulk delete (yet).

get_key_range gives you all the keys in a given range; what sort order /
collation it uses isn't clear, but I'd assume it would be a binary sort, so
'a' > 'Z' etc. I've mostly used numeric keys.

It doesn't tell you which column families or columns exist for that key.

Mark