You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Alaa Zubaidi (PDF)" <al...@pdf.com> on 2015/05/11 20:32:29 UTC

CQL Data Model question

Hi,

I am trying to port an Oracle Table to Cassandra.
the table is a wide table (931 columns) and could have millions of rows.
 name, filter1, filter2....filter30, data1, data2...data900

The user would retrieve multiple rows from this table and filter (30 filter
columns) by one or more (up to 3) of the filter columns, it could be any of
the filter columns.
(select * from table1 where name = .. and filter1 = .. and filter5= .. ;)

What is the best design for this in Cassandra/CQL?

I tried the following:
Create table tab1 (
name text,
flt1 text,
flt2 text,
flt3 text,
..
flt30 text,
data text,
PRIMARY KEY (name, flt1, flt2, flt3, ..... flt30) );

Is there any side effects of having 30 composite keys?

Thanks

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

Re: CQL Data Model question

Posted by "Alaa Zubaidi (PDF)" <al...@pdf.com>.
Thanks Ngoc, Jack

On Tue, May 12, 2015 at 4:56 AM, Ngoc Minh VO <ng...@bnpparibas.com>
wrote:

>  Hello,
>
>
>
> The problem with your approach is: you will need to specify all the 30
> filters (in the pre-defined order in PK) when querying.
>
>
>
> I would go for this data model:
>
> CREATE TABLE t (
>
>     name text,
>
>     filter_name1 text, filter_value1 text,
>
>     filter_name2 text, filter_value2 text,
>
>     filter_name3 text, filter_value3 text, -- since you only have up to 3
> filters in one query
>
>     PRIMARY KEY (name, f_n1, f_v1, f_n2, f_v2, f_n3, f_v3)
>
> );
>
>
>
> And denormalize the data when you import to the table :
>
> One line in Oracle table with K filters become C(3, K) lines in C* table.
>
>
>
> Best regards,
>
> Minh
>
>
>
> *From:* Alaa Zubaidi (PDF) [mailto:alaa.zubaidi@pdf.com]
> *Sent:* lundi 11 mai 2015 20:32
> *To:* user@cassandra.apache.org
> *Subject:* CQL Data Model question
>
>
>
> Hi,
>
>
>
> I am trying to port an Oracle Table to Cassandra.
>
> the table is a wide table (931 columns) and could have millions of rows.
>
>  name, filter1, filter2....filter30, data1, data2...data900
>
>
>
> The user would retrieve multiple rows from this table and filter (30
> filter columns) by one or more (up to 3) of the filter columns, it could be
> any of the filter columns.
>
> (select * from table1 where name = .. and filter1 = .. and filter5= .. ;)
>
>
>
> What is the best design for this in Cassandra/CQL?
>
>
>
> I tried the following:
>
> Create table tab1 (
>
> name text,
>
> flt1 text,
>
> flt2 text,
>
> flt3 text,
>
> ..
>
> flt30 text,
>
> data text,
>
> PRIMARY KEY (name, flt1, flt2, flt3, ..... flt30) );
>
>
>
> Is there any side effects of having 30 composite keys?
>
>
>
> Thanks
>
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.department@pdf.com* <le...@pdf.com>*.*
>
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential.
> If you receive this message in error,or are not the intended recipient(s),
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its
> purpose,
> dissemination or disclosure, either whole or partial, is prohibited. Since
> the internet
> cannot guarantee the integrity of this message which may not be reliable,
> BNP PARIBAS
> (and its subsidiaries) shall not be liable for the message if modified,
> changed or falsified.
> Do not print this message unless it is necessary,consider the environment.
>
>
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Ce message et toutes les pieces jointes (ci-apres le "message")
> sont etablis a l'intention exclusive de ses destinataires et sont
> confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation
> de
> ce message qui n'est pas conforme a sa destination, toute diffusion ou
> toute
> publication, totale ou partielle, est interdite. L'Internet ne permettant
> pas d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP
> Paribas
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message
> dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie.
> N'imprimez ce message que si necessaire, pensez a l'environnement.
>



-- 

Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 1000
San Jose, CA 95110  USA
Tel: 408-283-5639
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

-- 
*This message may contain confidential and privileged information. If it 
has been sent to you in error, please reply to advise the sender of the 
error and then immediately permanently delete it and all attachments to it 
from your systems. If you are not the intended recipient, do not read, 
copy, disclose or otherwise use this message or any attachments to it. The 
sender disclaims any liability for such unauthorized use. PLEASE NOTE that 
all incoming e-mails sent to PDF e-mail accounts will be archived and may 
be scanned by us and/or by external service providers to detect and prevent 
threats to our systems, investigate illegal or inappropriate behavior, 
and/or eliminate unsolicited promotional e-mails (“spam”). If you have any 
concerns about this process, please contact us at *
*legal.department@pdf.com* <le...@pdf.com>*.*

RE: CQL Data Model question

Posted by Ngoc Minh VO <ng...@bnpparibas.com>.
Hello,

The problem with your approach is: you will need to specify all the 30 filters (in the pre-defined order in PK) when querying.

I would go for this data model:
CREATE TABLE t (
    name text,
    filter_name1 text, filter_value1 text,
    filter_name2 text, filter_value2 text,
    filter_name3 text, filter_value3 text, -- since you only have up to 3 filters in one query
    PRIMARY KEY (name, f_n1, f_v1, f_n2, f_v2, f_n3, f_v3)
);

And denormalize the data when you import to the table :
One line in Oracle table with K filters become C(3, K) lines in C* table.

Best regards,
Minh

From: Alaa Zubaidi (PDF) [mailto:alaa.zubaidi@pdf.com]
Sent: lundi 11 mai 2015 20:32
To: user@cassandra.apache.org
Subject: CQL Data Model question

Hi,

I am trying to port an Oracle Table to Cassandra.
the table is a wide table (931 columns) and could have millions of rows.
 name, filter1, filter2....filter30, data1, data2...data900

The user would retrieve multiple rows from this table and filter (30 filter columns) by one or more (up to 3) of the filter columns, it could be any of the filter columns.
(select * from table1 where name = .. and filter1 = .. and filter5= .. ;)

What is the best design for this in Cassandra/CQL?

I tried the following:
Create table tab1 (
name text,
flt1 text,
flt2 text,
flt3 text,
..
flt30 text,
data text,
PRIMARY KEY (name, flt1, flt2, flt3, ..... flt30) );

Is there any side effects of having 30 composite keys?

Thanks

This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at legal.department@pdf.com<ma...@pdf.com>.


This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the internet 
cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed or falsified. 
Do not print this message unless it is necessary,consider the environment.

----------------------------------------------------------------------------------------------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: CQL Data Model question

Posted by Jack Krupansky <ja...@gmail.com>.
Porting an SQL data model to Cassandra is an anti-pattern - don't do it!
Instead, focus on developing a new data model that capitalizes on the key
strengths of Cassandra - distributed, scalable, fast writes, fast direct
access. Complex and ad-hoc queries are anti-patterns as well. I'll leave it
to DataStax staff to decide whether they feel that DSE Search/Solr can help
you on the complex and ad-hoc query front.

Tell us more about what business problem your SQL data model was trying to
accomplish. Get the business requirements sorted out before planning an
implementation.

You should also consider taking Cassandra data modeling training before
diving in yourself with a data model.

-- Jack Krupansky

On Mon, May 11, 2015 at 2:32 PM, Alaa Zubaidi (PDF) <al...@pdf.com>
wrote:

> Hi,
>
> I am trying to port an Oracle Table to Cassandra.
> the table is a wide table (931 columns) and could have millions of rows.
>  name, filter1, filter2....filter30, data1, data2...data900
>
> The user would retrieve multiple rows from this table and filter (30
> filter columns) by one or more (up to 3) of the filter columns, it could be
> any of the filter columns.
> (select * from table1 where name = .. and filter1 = .. and filter5= .. ;)
>
> What is the best design for this in Cassandra/CQL?
>
> I tried the following:
> Create table tab1 (
> name text,
> flt1 text,
> flt2 text,
> flt3 text,
> ..
> flt30 text,
> data text,
> PRIMARY KEY (name, flt1, flt2, flt3, ..... flt30) );
>
> Is there any side effects of having 30 composite keys?
>
> Thanks
>
> *This message may contain confidential and privileged information. If it
> has been sent to you in error, please reply to advise the sender of the
> error and then immediately permanently delete it and all attachments to it
> from your systems. If you are not the intended recipient, do not read,
> copy, disclose or otherwise use this message or any attachments to it. The
> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
> all incoming e-mails sent to PDF e-mail accounts will be archived and may
> be scanned by us and/or by external service providers to detect and prevent
> threats to our systems, investigate illegal or inappropriate behavior,
> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
> concerns about this process, please contact us at *
> *legal.department@pdf.com* <le...@pdf.com>*.*