You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Philippe <wa...@gmail.com> on 2010/04/12 01:31:25 UTC

Two dimensional matrices

Hello,

I would like to know if the following is indeed possible with Cassandra,
from my understanding of key & column slices it is but I am just beggining
to get my head around Cassandra...


I have data that is two dimensional, time varying (think of a grid). At each
cell of this grid,I store a binary array.
My data model will be

   - single keyspace
   - key = {Y dimension}
   - super column family = {type of data represented in each cell}
   - super column  = {time = week or month}
   - column ={X dimension}
   - value = { binary}

Will I be able to retrieve all values from a rectangle from this grid in a
single call to cassandra for given SCF and SC ? Will the result associate
each value with its key and column ?
Does it matter if it's a single call performance wise ?

Thanks
Philippe

Re: Two dimensional matrices

Posted by Eric Evans <ee...@rackspace.com>.
On Tue, 2010-04-13 at 00:45 +0200, Philippe wrote:
>         > However, you are also saying there is no way to also take
>         into account
>         > the "timeFrame" supercolumn in the same API call ? IE, it is
>         not
>         > possible to get back a data structure keyed by
>         > 'key,supercolumn,column' hence y,x and timeframe which I can
>         then
>         > process to my heart's delight ?
>         
>         
>         If you're talking about constructing predicates to slice on
>         both time
>         *and* X coordinate, then no. You can omit the super column
>         name from the
>         ColumnParent and return a slice of super columns (by time
>         period)
>         complete with all contained sub-columns, but you can't have it
>         both
>         ways, no.
> Eric, I'm trying to get my head around this...
> 
> 
> If I "omit the super column name" and do the query as you mentionned
> in your previous email, then you are saying it will return all columns
> corresponding to the column range of all super columns corresponding
> to the key range.
> This means it is possible to get a rectangular slice of the grid AND
> to get the "third dimension" which is time in my case, the only catch
> being that I cannot limit the amount of data retrieved in the 3rd
> dimension (timeframe).
> 
> 
> Is this correct ?

No, what I mean is that you can perform a slice that returns either
sub-columns, or super columns. In the former, the column names you are
slicing on are the sub-columns (X coords), in the latter it is super
columns (time). So:

On X coords, (same as my previous mail).

get_range_slice(
    keyspaceName,
    ColumnParent(CFname, timeFrame),
    SlicePredicate(
        slice_range=SliceRange(xstart, xend, false, colCount)
    ),
    ystart,
    yend,
    rowCount,
    consistencyLevel,
)

The "columns" attribute of the KeySlice structs returned will contain
the sub-columns contained in timeFrame that match your predicate.

On time.

get_range_slice(
    keyspaceName,
    ColumnParent(CFname, null),
    SlicePredicate(
        slice_range=SliceRange(timeStart, timeEnd, false, colCount)
    ),
    ystart,
    yend,
    rowCount,
    consistencyLevel,
)

The "columns" attribute of the KeySlice structs returned will contain
the super columns that match the predicate. Each of these super columns
will contain *all* of the sub-columns.

-- 
Eric Evans
eevans@rackspace.com


Re: Two dimensional matrices

Posted by Philippe <wa...@gmail.com>.
>
> > However, you are also saying there is no way to also take into account
> > the "timeFrame" supercolumn in the same API call ? IE, it is not
> > possible to get back a data structure keyed by
> > 'key,supercolumn,column' hence y,x and timeframe which I can then
> > process to my heart's delight ?
>
> If you're talking about constructing predicates to slice on both time
> *and* X coordinate, then no. You can omit the super column name from the
> ColumnParent and return a slice of super columns (by time period)
> complete with all contained sub-columns, but you can't have it both
> ways, no.

Eric, I'm trying to get my head around this...

If I "omit the super column name" and do the query as you mentionned in your
previous email, then you are saying it will return all columns corresponding
to the column range of all super columns corresponding to the key range.
This means it is possible to get a rectangular slice of the grid AND to get
the "third dimension" which is time in my case, the only catch being that I
cannot limit the amount of data retrieved in the 3rd dimension (timeframe).

Is this correct ?

Philippe

Re: Two dimensional matrices

Posted by Eric Evans <ee...@rackspace.com>.
On Tue, 2010-04-13 at 00:23 +0200, Philippe wrote:
> > Alright, so assuming we're looking for a slice of the grid against a
> > given time-frame, that would look something like:
> >
> > get_range_slice(
> >    keyspaceName,
> >    ColumnParent(CFname, timeFrame),
> >    SlicePredicate(
> >        slice_range=SliceRange(xstart, xend, false, colCount)
> >    ),
> >    ystart,
> >    yend,
> >    rowCount,
> >    consistencyLevel,
> > )
> >
> > Does that help?
> >
> Yes it confirms my understanding, thanks.
> 
> However, you are also saying there is no way to also take into account
> the "timeFrame" supercolumn in the same API call ? IE, it is not
> possible to get back a data structure keyed by
> 'key,supercolumn,column' hence y,x and timeframe which I can then
> process to my heart's delight ? 

If you're talking about constructing predicates to slice on both time
*and* X coordinate, then no. You can omit the super column name from the
ColumnParent and return a slice of super columns (by time period)
complete with all contained sub-columns, but you can't have it both
ways, no.

-- 
Eric Evans
eevans@rackspace.com


Re: Two dimensional matrices

Posted by Philippe <wa...@gmail.com>.
>
> Alright, so assuming we're looking for a slice of the grid against a
> given time-frame, that would look something like:
>
> get_range_slice(
>    keyspaceName,
>    ColumnParent(CFname, timeFrame),
>    SlicePredicate(
>        slice_range=SliceRange(xstart, xend, false, colCount)
>    ),
>    ystart,
>    yend,
>    rowCount,
>    consistencyLevel,
> )
>
> Does that help?
>
Yes it confirms my understanding, thanks.

However, you are also saying there is no way to also take into account the
"timeFrame" supercolumn in the same API call ? IE, it is not possible to get
back a data structure keyed by 'key,supercolumn,column' hence y,x and
timeframe which I can then process to my heart's delight ?

Philippe

Re: Two dimensional matrices

Posted by Eric Evans <ee...@rackspace.com>.
On Mon, 2010-04-12 at 22:40 +0200, Philippe wrote:
> If I understand what you're asking, a rectangle (identified by X and Y
> > coordinates for a time-frame), will boil down to a single column.
> There
> > are certainly no problems with retrieving a single sub-column from a
> > super column.
> >
> I realize I wasn't clear enough. Getting cell {x,y} is easy, I
> understand
> that.
> I am interested in getting a slice of that grid : all cells {x,y}
> where
> 
>    - x_min<=x<=x_max
>    - y_min<=y<=y_max
> 
> My understanding from the docs is that get_range_slices would do it
> but I
> would like confirmation.
> In fact, I believe this is the same question as Dop Sun is asking.

Alright, so assuming we're looking for a slice of the grid against a
given time-frame, that would look something like:

get_range_slice(
    keyspaceName,
    ColumnParent(CFname, timeFrame),
    SlicePredicate(
        slice_range=SliceRange(xstart, xend, false, colCount)
    ),
    ystart,
    yend,
    rowCount,
    consistencyLevel,
)

Does that help?

> > > Will the result associate each value with its key and column ?
> > The result will be a column that contains the binary value, which
> you
> > obtained by using key, column family, and super column name. So,
> yes.
> >
> What about this new case ? 

The result will be a collection of KeySlices representing the key and
corresponding columns, (so yes).

-- 
Eric Evans
eevans@rackspace.com


Re: Two dimensional matrices

Posted by Philippe <wa...@gmail.com>.
Eric, Dop,
Thanks for your answers.

If I understand what you're asking, a rectangle (identified by X and Y
> coordinates for a time-frame), will boil down to a single column. There
> are certainly no problems with retrieving a single sub-column from a
> super column.
>
I realize I wasn't clear enough. Getting cell {x,y} is easy, I understand
that.
I am interested in getting a slice of that grid : all cells {x,y} where

   - x_min<=x<=x_max
   - y_min<=y<=y_max

My understanding from the docs is that get_range_slices would do it but I
would like confirmation.
In fact, I believe this is the same question as Dop Sun is asking.



> > Will the result associate each value with its key and column ?
> The result will be a column that contains the binary value, which you
> obtained by using key, column family, and super column name. So, yes.
>
What about this new case ?

Philippe

Re: Two dimensional matrices

Posted by Eric Evans <ee...@rackspace.com>.
On Mon, 2010-04-12 at 01:31 +0200, Philippe wrote:
> I have data that is two dimensional, time varying (think of a grid).
> At each
> cell of this grid,I store a binary array.
> My data model will be
> 
>    - single keyspace
>    - key = {Y dimension}
>    - super column family = {type of data represented in each cell}
>    - super column  = {time = week or month}
>    - column ={X dimension}
>    - value = { binary}
> 
> Will I be able to retrieve all values from a rectangle from this grid
> in a single call to cassandra for given SCF and SC ?

If I understand what you're asking, a rectangle (identified by X and Y
coordinates for a time-frame), will boil down to a single column. There
are certainly no problems with retrieving a single sub-column from a
super column.
 
> Will the result associate each value with its key and column ?

The result will be a column that contains the binary value, which you
obtained by using key, column family, and super column name. So, yes.

> Does it matter if it's a single call performance wise ? 

Yes, if for no other reason than it requires another round-trip across
the network.

-- 
Eric Evans
eevans@rackspace.com


RE: Two dimensional matrices

Posted by Dop Sun <su...@dopsun.com>.
I don't know whether I'm wrong or not (I'm also new to Cassandra). But looks
like we only can query a single Super Column at a single query since these
values are specified in the ColumnParent parameter. Which means that you
only can query a single week or month (as your super column).

 

From: Philippe [mailto:watcherfr@gmail.com] 
Sent: Monday, April 12, 2010 7:31 AM
To: user@cassandra.apache.org
Subject: Two dimensional matrices

 

Hello,

 

I would like to know if the following is indeed possible with Cassandra,
from my understanding of key & column slices it is but I am just beggining
to get my head around Cassandra...

 

 

I have data that is two dimensional, time varying (think of a grid). At each
cell of this grid,I store a binary array.

My data model will be

*	single keyspace
*	key = {Y dimension}
*	super column family = {type of data represented in each cell}
*	super column  = {time = week or month}
*	column ={X dimension}
*	value = { binary}

Will I be able to retrieve all values from a rectangle from this grid in a
single call to cassandra for given SCF and SC ? Will the result associate
each value with its key and column ?

Does it matter if it's a single call performance wise ?

 

Thanks

Philippe