You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by T Akhayo <t....@gmail.com> on 2012/10/03 21:00:27 UTC

Simple data model for 1 simple range query?

Good evening,

I have a quite simple data model. Pseudo CQL code:

create table bars(
timeframe int,
date Date,
info1 double,
info2 double,
..
primary key( timeframe, date )
)

My most important query is (which might be the only one actually):
select * from bars where timeframe=X and date>Y and date <Z

I came to this model because i did read in the past (when 0.7 came out) was
very fast at range queries (using a slice method) when the fields were
keys. And now with cql all the nasty details are hidden ( i have not tested
this yet ;-) )

Is it correct that the above model is a good and fast solution for my query?

Kind regards.

Re: Simple data model for 1 simple range query?

Posted by T Akhayo <t....@gmail.com>.

Hi Dean,

Thank you for your reply, i appreciate the help. I managed to get my data
model in cassandra and already inserted data and ran the query, but don't
yet have enough data to do correct benchmarking. I'm now trying to load a
huge amount of data using SSTableSimpleUnsortedWriter cause doing it with
insert queries takes quite a while, but is is quite challenging to get this
one working.

Kind regards,

2012/10/3 Hiller, Dean <De...@nrel.gov>

> Is timeframe/date your composite key? Where timeframe is the first time of
> a partition of time (ie. If you partition by month, it is the very first
> time of that month).  If so, then, yes, it will be very fast.  The smaller
> your partitions are, the smaller your indexes are as well(ie. B-trees which
> you can grow pretty big).  Realize you always have to have timeframe with
> equals(=) NOT >, <,<=,>= but  the other columns you can use the other
> operators.
>
> Also, if you ever find a need to partition the same data twice, you can
> always look into PlayOrm with multi-partitioning and it's Scalable SQL
> which can do joins when necessary.
>
> Later,
> Dean
>
> From: T Akhayo <t....@gmail.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> Date: Wednesday, October 3, 2012 1:00 PM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> Subject: Simple data model for 1 simple range query?
>
> Good evening,
>
> I have a quite simple data model. Pseudo CQL code:
>
> create table bars(
> timeframe int,
> date Date,
> info1 double,
> info2 double,
> ..
> primary key( timeframe, date )
> )
>
> My most important query is (which might be the only one actually):
> select * from bars where timeframe=X and date>Y and date <Z
>
> I came to this model because i did read in the past (when 0.7 came out)
> was very fast at range queries (using a slice method) when the fields were
> keys. And now with cql all the nasty details are hidden ( i have not tested
> this yet ;-) )
>
> Is it correct that the above model is a good and fast solution for my
> query?
>
> Kind regards.
>
>

Re: Simple data model for 1 simple range query?

Posted by "Hiller, Dean" <De...@nrel.gov>.

Is timeframe/date your composite key? Where timeframe is the first time of a partition of time (ie. If you partition by month, it is the very first time of that month).  If so, then, yes, it will be very fast.  The smaller your partitions are, the smaller your indexes are as well(ie. B-trees which you can grow pretty big).  Realize you always have to have timeframe with equals(=) NOT >, <,<=,>= but  the other columns you can use the other operators.

Also, if you ever find a need to partition the same data twice, you can always look into PlayOrm with multi-partitioning and it's Scalable SQL which can do joins when necessary.

Later,
Dean

From: T Akhayo <t....@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, October 3, 2012 1:00 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Simple data model for 1 simple range query?

Good evening,

I have a quite simple data model. Pseudo CQL code:

create table bars(
timeframe int,
date Date,
info1 double,
info2 double,
..
primary key( timeframe, date )
)

My most important query is (which might be the only one actually):
select * from bars where timeframe=X and date>Y and date <Z

I came to this model because i did read in the past (when 0.7 came out) was very fast at range queries (using a slice method) when the fields were keys. And now with cql all the nasty details are hidden ( i have not tested this yet ;-) )

Is it correct that the above model is a good and fast solution for my query?

Kind regards.