You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Asit KAUSHIK <as...@gmail.com> on 2015/02/04 08:49:50 UTC

Suggestion Date as a Partition key

HI All,
Please excuse me if my queries of are of novice user . On continuation with
my last table design issue i am thinking of creating a Partition key on
date(only) as in our search criteria time frame would always be there.

So my queries are

1) Is this a good idea as i don't have any other field to add it with the
date key. All the time series examples points to the combination here i am
talking only about a datewhich i would convert to an int format like
04022015.
2) Also any elaborate doc or writeup to identify how much data is on which
node  so that i can see the distribution of data on to the nodes

For your reference below is my table structure


CREATE TABLE logentries (
    eventDate bigint PRIMARY KEY,
    context text,
    date_to_hour bigint,
    durationinseconds float,
    eventtimestamputc timestamp,
    ipaddress inet,
    logentrytimestamputc timestamp,
    loglevel int,
    logmessagestring text,
    logsequence int,
    message text,
    modulename text,
    productname text,
    searchitems map<text, text>,
    servername text,
    sessionname text,
    stacktrace text,
    threadname text,
    timefinishutc timestamp,
    timestartutc timestamp,
    urihostname text,
    uripathvalue text,
    uriquerystring text,
    useragentstring text,
    username text
);

Thanks so much all for the help

Cheers
Asit

Re: Suggestion Date as a Partition key

Posted by Srinivasa T N <se...@gmail.com>.
I would not suggest only date as the partition key.  This creates all the
records related to a single day go into a single partition and will create
load on one partition when other partitions are free.  Try to add some
other field also to the primary key so that the load is distributed.

Check this:
http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/#.VNHTQTWli1E
(In partitcular -
*Choose the proper row key – it’s your “shard key”)*

On Wed, Feb 4, 2015 at 1:19 PM, Asit KAUSHIK <as...@gmail.com>
wrote:

> HI All,
> Please excuse me if my queries of are of novice user . On continuation
> with my last table design issue i am thinking of creating a Partition key
> on date(only) as in our search criteria time frame would always be there.
>
> So my queries are
>
> 1) Is this a good idea as i don't have any other field to add it with the
> date key. All the time series examples points to the combination here i am
> talking only about a datewhich i would convert to an int format like
> 04022015.
> 2) Also any elaborate doc or writeup to identify how much data is on which
> node  so that i can see the distribution of data on to the nodes
>
> For your reference below is my table structure
>
>
> CREATE TABLE logentries (
>     eventDate bigint PRIMARY KEY,
>     context text,
>     date_to_hour bigint,
>     durationinseconds float,
>     eventtimestamputc timestamp,
>     ipaddress inet,
>     logentrytimestamputc timestamp,
>     loglevel int,
>     logmessagestring text,
>     logsequence int,
>     message text,
>     modulename text,
>     productname text,
>     searchitems map<text, text>,
>     servername text,
>     sessionname text,
>     stacktrace text,
>     threadname text,
>     timefinishutc timestamp,
>     timestartutc timestamp,
>     urihostname text,
>     uripathvalue text,
>     uriquerystring text,
>     useragentstring text,
>     username text
> );
>
> Thanks so much all for the help
>
> Cheers
> Asit
>