You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org> on 2012/01/03 19:44:41 UTC

[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns

    [ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178902#comment-13178902 ] 

Sylvain Lebresne commented on CASSANDRA-2474:
---------------------------------------------



Ok, I think I'm really warming up to what we're getting at. I would change the syntax of the VALUE() thing however. Instead of:
{noformat}
CREATE TABLE timeline (
   userid int,
   posted_at uuid,
   body string,
   PRIMARY KEY(user_id, posted_at),
   VALUE(body)
)
{noformat}
I would prefer:
{noformat}
CREATE COMPACT TABLE timeline (
   userid int,
   posted_at uuid,
   body string,
   PRIMARY KEY(user_id, posted_at),
)
{noformat}

The reasons being that it really influences the implementation layout of the CF in C*. Namely, the non-compact CF defined by 
{noformat}
CREATE TABLE timeline (
   userid int,
   posted_at uuid,
   body string,
   PRIMARY KEY(user_id, posted_at),
)
{noformat}
would look in C* like:
{noformat}
<userid> : {
    <posted_at>:'body' -> <value>
}
{noformat}
while the COMPACT variant would be:
{noformat}
<userid> : {
    <posted_at> -> <value>
}
{noformat}
which is using the fact that there is only 1 field not part of the key to "optimize" the layout. And I believe making the COMPACT part of the CREATE emphasis better that it's a property of the definition itself (that cannot be changed) rather than of that specific 'body' field. It also make the rule for COMPACT table rather simple: "a compact table should have only one field not part of the primary key"; you don't have to deal with errors like someone defining two VALUE() for instance.


That being said, I'd like to try to resume where we're at (including the COMPACT change above) and add a few random ideas along the way. Please correct me if I've got something wrong.

I think we have 4 different cases, 2 for 'standard' CF without composites:
* static CFs (the only case CQL handle really well today)
* dynamic CFs (wide rows, time series if you prefer) and 2 for CF with composite column names:
* 'dense' composite (typically time series but where the key is naturally multi-parts)
* 'sparse' composite (aka super columns)

Let me try to take an example for which, with how it would translate internally and example queries.

h3. Standard "static" CF

"For each user, holds his infos"
{noformat}
CREATE TABLE users (
    userid uuid PRIMARY KEY,
    firstname text,
    lastname text,
    age int
)
{noformat}

In C*:
{noformat}
<userid> : {
    'firstname' -> <value>
    'lastname' -> <value>
    'age' -> <value>
}
{noformat}

Query:
{noformat}
SELECT firstname, lastname FROM users WHERE userid = '...';
{noformat}

h3. Standard "dynamic" CF

"For each user, keep each url he clicked on with the date of last click"

{noformat}
CREATE COMPACT TABLE clicks (
    userid uuid,
    url text,
    timestamp date
    PRIMARY KEY (userid, url)
)
{noformat}

In C*:
{noformat}
<userid> : {
    <url> -> <timestamp>
}
{noformat}

Query:
{noformat}
SELECT url, timestamp FROM clicks WHERE userid = '...';
SELECT timestamp FROM clicks WHERE userid = '...' and url = 'http://...';
{noformat}

h3. 'dense' composite

"For each user, keep ip and port from where he connected with the date of last
connection"

{noformat}
CREATE COMPACT TABLE connections (
    userid uuid,
    ip binary,
    port int,
    timestamp date
    PRIMARY KEY (userid, ip, port)
)
{noformat}

In C*:
{noformat}
<userid> : {
    <ip>:<port> -> <timestamp>
}
{noformat}

Query:
{noformat}
SELECT ip, port, timestamp FROM connections WHERE userid = '...';
{noformat}

h3. 'sparse' composite

"User timeline"

{noformat}
CREATE TABLE timeline (
    userid uuid,
    posted_at date,
    body text,
    posted_by text,
    PRIMARY KEY (user_id, posted_at),
);
{noformat}

In C*:
{noformat}
<userid> : {
    <posted_at>:'body' -> <value>
    <posted_at>:'posted_by' -> <value>
}
{noformat}

Query:
{noformat}
SELECT body, posted_by FROM timeline WHERE userid = '...' and posted_at = '2 janvier 2010'
{noformat}

Note: I think we really should also be able to do queries like:
{noformat}
SELECT posted_ad, body, posted_by FROM timeline WHERE userid = '...' and posted_at > '2 janvier 2010'
{noformat}
but that's more akin to the modification of the syntax for slices.


h3. Random other ideas

# We could allow something like:
{noformat}
CONSTRAINT key PRIMARY KEY (userid, ip, port)
{noformat}
which would then allow to write
{noformat}
SELECT timestamp FROM users WHERE key = ('...', 192.168.0.1, 80);
{noformat}
(I believe this is the 'standard' notation to name a 'composite' key in SQL)
# Above we're ony handling the use of composites for column names, but they can be useful for value (and row keys) and it could be nice to have an easy notation for that (clearly a following ticket however). What about:
{noformat}
CREATE COMPACT TABLE timeline (
    userid_part1 text,
    userid_part2 int,
    posted_at date,
    posted_by uuid,
    body text
    header text
    GROUP (userid_part1, userid_part2) AS userid,
    PRIMARY KEY (userid, posted_at, posted_by)
    GROUP (header, body)
)
{noformat}
In C*:
{noformat}
<userid_part1>:<userid_part2> : {
    <posted_at>:<posted_by> -> <header>:<body>
}
{noformat}
Query:
{noformat}
SELECT posted_at, posted_by, body, header FROM timeline WHERE userid = ('john', 32)
{noformat}

                
> CQL support for compound columns
> --------------------------------
>
>                 Key: CASSANDRA-2474
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Eric Evans
>            Assignee: Pavel Yaskevich
>              Labels: cql
>             Fix For: 1.1
>
>         Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of compound column names (the CQL syntax is colon-delimted terms), and then teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira