You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Bill Walters <bi...@gmail.com> on 2017/10/30 06:33:33 UTC

Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

Hi Everyone,


We need some help in deciding whether to use User Defined Type(UDT) nested
in LIST collection columns in our table.
In a couple of months, we are planning to roll out a new solution that will
incorporate a Read heavy use case.
We have one big table which will hold around 250 million records with 2
LIST type columns holding UDT elements.(UDT nested in LIST)

Below is our cluster setup that we are planning.

*Cassandra version:* DSE 5.0.7
*No of Data centers:* 2 (AWS East and AWS West regions)
*No of Nodes:* 12 nodes (6 nodes in AWS East and 6 nodes in AWS West)
*Replication Factor:* 3 in each data center.
*Read Consistency Level:* Local_Quorum
*Table Compaction Strategy:* LevelTieredCompactionStrategy
*Use Case:* Read Heavy

Table Schema:

CREATE TYPE account (
acct_system_id text,
acct_id text,
acct_sec_cust_id text,
attributes frozen<map<text, text>>);

CREATE TYPE login (
login_source_id text,
login_id text,
attributes frozen<map<text, text>>);


CREATE TABLE consumers_id (
unique_consumer_id text PRIMARY KEY,
*accounts list<frozen<account>>*,
details map<text, text>,
dob text,
background text,
*logins list<frozen<login>>*,
p_id text);


Currently, we are running performance tests, but not entirely confident
whether reads would yield good performance. Since UDTs are frozen and
stored as BLOBs will there be any impediment while converting them after
read by coordinator.

If anyone has implemented a similar use-case, please let us know your
suggestions.

Thank You,
Bill Walters.

Re: Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

Posted by Bill Walters <bi...@gmail.com>.

Hi DuyHai,

Thank you for providing your feedback to our question.
Just to elaborate on the 2 factors that you have provided above.

1) Collection cardinality e.g. the number of elements in the collection. A
maximum of 64,000 elements can be stored.

2) the size of each element in the collection. The bigger the element (UDT
in your case), the more memory it will requires on the coordinator side for
decoding / deserialization. Each UDT shouldn't exceed 64 KB size.



Thank You,
Bill Walters.

On Mon, Oct 30, 2017 at 3:52 AM, DuyHai Doan <do...@gmail.com> wrote:

> Hello Bill
>
> First if you don't care about insertion order it's better to use Set
> rather than list. List implementation requires read before write for some
> operations.
>
> Second, the read performance of the collection itself depends on 2 factors
> :
>
> 1) collection cardinality e.g. the number of elements in the collection
>
> 2) the size of each element in the collection. The bigger the element (UDT
> in your case), the more memory it will requires on the coordinator side for
> decoding / deserialization
>
> If you manage to keep both numbers reasonable it should be fine
>
>
>
> Le 30 oct. 2017 07:33, "Bill Walters" <bi...@gmail.com> a écrit :
>
> Hi Everyone,
>
>
> We need some help in deciding whether to use User Defined Type(UDT) nested
> in LIST collection columns in our table.
> In a couple of months, we are planning to roll out a new solution that
> will incorporate a Read heavy use case.
> We have one big table which will hold around 250 million records with 2
> LIST type columns holding UDT elements.(UDT nested in LIST)
>
> Below is our cluster setup that we are planning.
>
> *Cassandra version:* DSE 5.0.7
> *No of Data centers:* 2 (AWS East and AWS West regions)
> *No of Nodes:* 12 nodes (6 nodes in AWS East and 6 nodes in AWS West)
> *Replication Factor:* 3 in each data center.
> *Read Consistency Level:* Local_Quorum
> *Table Compaction Strategy:* LevelTieredCompactionStrategy
> *Use Case:* Read Heavy
>
> Table Schema:
>
> CREATE TYPE account (
> acct_system_id text,
> acct_id text,
> acct_sec_cust_id text,
> attributes frozen<map<text, text>>);
>
> CREATE TYPE login (
> login_source_id text,
> login_id text,
> attributes frozen<map<text, text>>);
>
>
> CREATE TABLE consumers_id (
> unique_consumer_id text PRIMARY KEY,
> *accounts list<frozen<account>>*,
> details map<text, text>,
> dob text,
> background text,
> *logins list<frozen<login>>*,
> p_id text);
>
>
> Currently, we are running performance tests, but not entirely confident
> whether reads would yield good performance. Since UDTs are frozen and
> stored as BLOBs will there be any impediment while converting them after
> read by coordinator.
>
> If anyone has implemented a similar use-case, please let us know your
> suggestions.
>
> Thank You,
> Bill Walters.
>
>
>
>
>
>
>

Re: Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

Posted by DuyHai Doan <do...@gmail.com>.

Hello Bill

First if you don't care about insertion order it's better to use Set rather
than list. List implementation requires read before write for some
operations.

Second, the read performance of the collection itself depends on 2 factors :

1) collection cardinality e.g. the number of elements in the collection

2) the size of each element in the collection. The bigger the element (UDT
in your case), the more memory it will requires on the coordinator side for
decoding / deserialization

If you manage to keep both numbers reasonable it should be fine



Le 30 oct. 2017 07:33, "Bill Walters" <bi...@gmail.com> a écrit :

Hi Everyone,


We need some help in deciding whether to use User Defined Type(UDT) nested
in LIST collection columns in our table.
In a couple of months, we are planning to roll out a new solution that will
incorporate a Read heavy use case.
We have one big table which will hold around 250 million records with 2
LIST type columns holding UDT elements.(UDT nested in LIST)

Below is our cluster setup that we are planning.

*Cassandra version:* DSE 5.0.7
*No of Data centers:* 2 (AWS East and AWS West regions)
*No of Nodes:* 12 nodes (6 nodes in AWS East and 6 nodes in AWS West)
*Replication Factor:* 3 in each data center.
*Read Consistency Level:* Local_Quorum
*Table Compaction Strategy:* LevelTieredCompactionStrategy
*Use Case:* Read Heavy

Table Schema:

CREATE TYPE account (
acct_system_id text,
acct_id text,
acct_sec_cust_id text,
attributes frozen<map<text, text>>);

CREATE TYPE login (
login_source_id text,
login_id text,
attributes frozen<map<text, text>>);


CREATE TABLE consumers_id (
unique_consumer_id text PRIMARY KEY,
*accounts list<frozen<account>>*,
details map<text, text>,
dob text,
background text,
*logins list<frozen<login>>*,
p_id text);


Currently, we are running performance tests, but not entirely confident
whether reads would yield good performance. Since UDTs are frozen and
stored as BLOBs will there be any impediment while converting them after
read by coordinator.

If anyone has implemented a similar use-case, please let us know your
suggestions.

Thank You,
Bill Walters.