You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Bartosz Kołodziej <ba...@gmail.com> on 2010/07/03 00:29:11 UTC

Need a little help with data model design

I'm new to cassandra, and I want use it to store:

loggers = { // (super)ColumnFamily ?
    logger1 : { // row inside super CF ?
        timestamp1 : {
            value : 10
        },
        timestamp2 : {
            value : 12
        }
        (many many many more)
    }
    logger2 : { //logger of diffrent type (in this example it logs 3 values
instead of 1)
        timestamp1 : {
            v : 300,
            c : 123,
            s : 12.13
        },
        timestamp2 : {
            v : 300
            c : 123
            s : 12.13
        }
        (many many many more)
    }
    (many many many more)
}

the only way i will be accesing this data is:
- example: fetch slice of data from logger2 ( start = 1278009131 (timestmap)
, end = 1278109131 )
     expecting sorted array of data.
- example: fetch slice of data from (logger2 and logger10 and logger20 and
logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
     expecting map of sorted arrays of data. [it is basically N queries of
first type]

is this right definition of above: <ColumnFamily CompareWith="TimeUUIDType"
ColumnType="Super"
    CompareSubcolumnsWith="BytesType" Name="loggers"/> ?

what's the best way to model this data in cassadra (keeping in mind
partitioning and other important stuff) ?

Re: Need a little help with data model design

Posted by Jonathan Ellis <jb...@gmail.com>.

i would expect row per log entry will be substantially faster to query.

2010/7/5 Bartosz Kołodziej <ba...@gmail.com>:
> I have big and dynamic number of loggers.
> According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB
> size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) )
> I think I can go with svn release at the moment.
> Solving this by composite key (logger+timestamp) would require
> OrderPreservingPartitioner to make efficient range queries, while in first
> approach in can go with RandomPartitioner (data would be partitioned by
> logger - simple and effective).
> Btw which model provides faster queries ?
> (i need only to get slice (timestamp1 to timestmap2) of data for logger X )
> On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> You don't want to have all the data from a single logger in a single
>> row b/c of the 2GB size limit.
>>
>> If you have a small, static number of loggers you could create one CF
>> per logger and use timestamp as the row key.  Otherwise use a
>> composite key (logger+timestamp) as the key in a single CF.
>>
>> 2010/7/2 Bartosz Kołodziej <ba...@gmail.com>:
>> > I'm new to cassandra, and I want use it to store:
>> > loggers = { // (super)ColumnFamily ?
>> >     logger1 : { // row inside super CF ?
>> >         timestamp1 : {
>> >             value : 10
>> >         },
>> >         timestamp2 : {
>> >             value : 12
>> >         }
>> >         (many many many more)
>> >     }
>> >     logger2 : { //logger of diffrent type (in this example it logs 3
>> > values
>> > instead of 1)
>> >         timestamp1 : {
>> >             v : 300,
>> >             c : 123,
>> >             s : 12.13
>> >         },
>> >         timestamp2 : {
>> >             v : 300
>> >             c : 123
>> >             s : 12.13
>> >         }
>> >         (many many many more)
>> >     }
>> >     (many many many more)
>> > }
>> > the only way i will be accesing this data is:
>> > - example: fetch slice of data from logger2 ( start = 1278009131
>> > (timestmap)
>> > , end = 1278109131 )
>> >      expecting sorted array of data.
>> > - example: fetch slice of data from (logger2 and logger10 and logger20
>> > and
>> > logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
>> >      expecting map of sorted arrays of data. [it is basically N queries
>> > of
>> > first type]
>> > is this right definition of above: <ColumnFamily
>> > CompareWith="TimeUUIDType"
>> > ColumnType="Super"
>> >     CompareSubcolumnsWith="BytesType" Name="loggers"/> ?
>> > what's the best way to model this data in cassadra (keeping in mind
>> > partitioning and other important stuff) ?
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Need a little help with data model design

Posted by Bartosz Kołodziej <ba...@gmail.com>.

I have big and dynamic number of loggers.

According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB
size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) )
I think I can go with svn release at the moment.

Solving this by composite key (logger+timestamp) would require
OrderPreservingPartitioner to make efficient range queries, while in first
approach in can go with RandomPartitioner (data would be partitioned by
logger - simple and effective).

Btw which model provides faster queries ?
(i need only to get slice (timestamp1 to timestmap2) of data for logger X )

On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> You don't want to have all the data from a single logger in a single
> row b/c of the 2GB size limit.
>
> If you have a small, static number of loggers you could create one CF
> per logger and use timestamp as the row key.  Otherwise use a
> composite key (logger+timestamp) as the key in a single CF.
>
> 2010/7/2 Bartosz Kołodziej <ba...@gmail.com>:
> > I'm new to cassandra, and I want use it to store:
> > loggers = { // (super)ColumnFamily ?
> >     logger1 : { // row inside super CF ?
> >         timestamp1 : {
> >             value : 10
> >         },
> >         timestamp2 : {
> >             value : 12
> >         }
> >         (many many many more)
> >     }
> >     logger2 : { //logger of diffrent type (in this example it logs 3
> values
> > instead of 1)
> >         timestamp1 : {
> >             v : 300,
> >             c : 123,
> >             s : 12.13
> >         },
> >         timestamp2 : {
> >             v : 300
> >             c : 123
> >             s : 12.13
> >         }
> >         (many many many more)
> >     }
> >     (many many many more)
> > }
> > the only way i will be accesing this data is:
> > - example: fetch slice of data from logger2 ( start = 1278009131
> (timestmap)
> > , end = 1278109131 )
> >      expecting sorted array of data.
> > - example: fetch slice of data from (logger2 and logger10 and logger20
> and
> > logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
> >      expecting map of sorted arrays of data. [it is basically N queries
> of
> > first type]
> > is this right definition of above: <ColumnFamily
> CompareWith="TimeUUIDType"
> > ColumnType="Super"
> >     CompareSubcolumnsWith="BytesType" Name="loggers"/> ?
> > what's the best way to model this data in cassadra (keeping in mind
> > partitioning and other important stuff) ?
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Need a little help with data model design

Posted by Jonathan Ellis <jb...@gmail.com>.

You don't want to have all the data from a single logger in a single
row b/c of the 2GB size limit.

If you have a small, static number of loggers you could create one CF
per logger and use timestamp as the row key.  Otherwise use a
composite key (logger+timestamp) as the key in a single CF.

2010/7/2 Bartosz Kołodziej <ba...@gmail.com>:
> I'm new to cassandra, and I want use it to store:
> loggers = { // (super)ColumnFamily ?
>     logger1 : { // row inside super CF ?
>         timestamp1 : {
>             value : 10
>         },
>         timestamp2 : {
>             value : 12
>         }
>         (many many many more)
>     }
>     logger2 : { //logger of diffrent type (in this example it logs 3 values
> instead of 1)
>         timestamp1 : {
>             v : 300,
>             c : 123,
>             s : 12.13
>         },
>         timestamp2 : {
>             v : 300
>             c : 123
>             s : 12.13
>         }
>         (many many many more)
>     }
>     (many many many more)
> }
> the only way i will be accesing this data is:
> - example: fetch slice of data from logger2 ( start = 1278009131 (timestmap)
> , end = 1278109131 )
>      expecting sorted array of data.
> - example: fetch slice of data from (logger2 and logger10 and logger20 and
> logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
>      expecting map of sorted arrays of data. [it is basically N queries of
> first type]
> is this right definition of above: <ColumnFamily CompareWith="TimeUUIDType"
> ColumnType="Super"
>     CompareSubcolumnsWith="BytesType" Name="loggers"/> ?
> what's the best way to model this data in cassadra (keeping in mind
> partitioning and other important stuff) ?
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com