You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Benjamin Christenson <be...@kineticdata.com> on 2020/06/09 13:51:35 UTC

Partition size, limits, recommendations for tables where all columns are part of the primary key

Hello all, I am doing some data modeling and want to make sure that I
understand some nuances to cell counts, partition sizes, and related
recommendations.  Am I correct in my understanding that tables for which
every column is in the primary key will always have 0 cells?

For example, using https://cql-calculator.herokuapp.com/, I tested the
following table definition with 1000000 (1 million) rows per partition and
an average value size of 255 bytes, and it returned that there were 0 cells
and the partition took up 32 bytes total:
  CREATE TABLE IF NOT EXISTS widgets (
    id timeuuid,
    key_id timeuuid,
    parent_id timeuuid,
    value text,
    PRIMARY KEY ((parent_id, key_id), value, id)
  )

Obviously the total amount of disk space for this table must be more than
32 bytes.  In this situation, how should I be reasoning about partition
sizes (in terms of the 2B cell limit, and 100MB-400MB partition size
limit)?  Additionally, are there other limits / potential performance
issues I should be concerned about?

Ben Christenson
Developer

Kinetic Data, Inc.
Your business. Your process.
651-556-0937  |  ben.christenson@kineticdata.com
www.kineticdata.com  |  community.kineticdata.com

Re: Partition size, limits, recommendations for tables where all columns are part of the primary key

Posted by Alex Ott <al...@gmail.com>.
Hi

Yes, basically rows have no cells as everything is in the partition
key/clustering columns.

You can always look unto the data using the sstabledump (this is for DSE
6.7 that I have running):

 sstabledump ac-1-bti-Data.db
[
  {
    "partition" : {
      "key" : [ "977eb1f1-aa5b-11ea-b91a-db426f6f892c",
"977ed900-aa5b-11ea-b91a-db426f6f892c" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 78,
        "clustering" : [ "test", "977ed901-aa5b-11ea-b91a-db426f6f892c" ],
        "liveness_info" : { "tstamp" : "2020-06-09T14:14:54.863249Z" },
        "cells" : [ ]
      }
    ]
  }
]

P.S. You can play with your schema, and do some performance tests using the
https://github.com/nosqlbench/


On Tue, Jun 9, 2020 at 3:51 PM Benjamin Christenson <
ben.christenson@kineticdata.com> wrote:

> Hello all, I am doing some data modeling and want to make sure that I
> understand some nuances to cell counts, partition sizes, and related
> recommendations.  Am I correct in my understanding that tables for which
> every column is in the primary key will always have 0 cells?
>
> For example, using https://cql-calculator.herokuapp.com/, I tested the
> following table definition with 1000000 (1 million) rows per partition and
> an average value size of 255 bytes, and it returned that there were 0 cells
> and the partition took up 32 bytes total:
>   CREATE TABLE IF NOT EXISTS widgets (
>     id timeuuid,
>     key_id timeuuid,
>     parent_id timeuuid,
>     value text,
>     PRIMARY KEY ((parent_id, key_id), value, id)
>   )
>
> Obviously the total amount of disk space for this table must be more than
> 32 bytes.  In this situation, how should I be reasoning about partition
> sizes (in terms of the 2B cell limit, and 100MB-400MB partition size
> limit)?  Additionally, are there other limits / potential performance
> issues I should be concerned about?
>
> Ben Christenson
> Developer
>
> Kinetic Data, Inc.
> Your business. Your process.
> 651-556-0937  |  ben.christenson@kineticdata.com
> www.kineticdata.com  |  community.kineticdata.com
>
>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)