You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Eric Evans (JIRA)" <ji...@apache.org> on 2011/03/04 02:55:37 UTC

[jira] Commented: (CASSANDRA-2027) term definitions

    [ https://issues.apache.org/jira/browse/CASSANDRA-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002396#comment-13002396 ] 

Eric Evans commented on CASSANDRA-2027:
---------------------------------------


The following reboot is the result of a discussion between Gary Dusbabek, Jonathan Ellis, and myself (any errors or misunderstandings are my fault).

h2. Revised definitions (+ semantics)

h3. String

Anything between quotes, _or_ any unquoted alnum values that begins with a letter.

Examples:

{code:style=SQL}
SELECT "0day" FROM cf;
SELECT B_day FROM cf;

UPDATE cf SET "value-low" = "14%" WHERE KEY = "@skinny";
UPDATE cf SET foo = bar WHERE KEY = baz;
{code}

h3. Integer

An undecorated numeric literal.  How the term is converted node-side is determined by the comparator/validator in use.  For example, {{100}} could be converted to a 4-byte integer or an 8-byte long depending on whether the comparator/validator was an {{IntegerType}} or {{LongType}} respectively.

Examples:

{code:style=SQL}
SELECT 10..100 FROM cf WHERE KEY = "key";
UPDATE cf SET 1000 = "thousand", 100 = "hundred" WHERE KEY = "key";
{code}

h3. UUID

A UUID formated as a hexidecimal-hyphenated string (i.e. {{b137dd10-45b6-11e0-8955-00247ee1f924}}).

Examples:

{code:style=SQL}
SELECT f1fa6c22-45b7-11e0-8955-00247ee1f924 FROM cf WHERE KEY = key;
UPDATE cf SET 0ceb632e-45b8-11e0-8955-00247ee1f924 = 9 WHERE KEY = key;
{code}

As a special-case, when the comparator/validator is TimeUUIDType, a quoted string literal can be used to supply a parse-able timestamp (currently most ISO8601 variants).

{code:style=SQL}
SELECT "2011-01-01".."2011-02-01" FROM cf WHERE KEY = key;
{code}

_Note: it doesn't make sense to try to query by-column using a timestamp like this, because date-time is only one component of a type 1 UUID.  The docs will need to be clear about this._

h3. UTF-8

A double-quoted string literal that is prefixed with a "u" to indicate that it should be encoded to bytes using the utf-8 charset node-side.

Examples:

{code:style=SQL}
SELECT u"name" FROM cf;
UPDATE cf SET u"name" = u"value" WHERE KEY = "key";
{code}

_This one is iffy. Consensus seems to be that the UTF8 charset should implicitly be used in the conversion to bytes when comparator/validator is UTF8Type. If that's the case, then the only time where this term would do anything useful would be for storing UTF8 where comparator/validator is BytesType. That seems corner-case enough for me to warrant leaving it out entirely._

----

One point of contention during the discussion that spawned this reboot was type inference.  What's proposed above adds some inference, (namely for unicode, decimal values, and some UUID cases), but I'm going to make one more attempt at stopping it there. I'm nothing if not persistent, right? :)

For example, Least Surprise says that {{"10"}} and {{10}} differ in that one is explicitly a string, so converting it to a numeric type with a decimal value of 10 (still) seems wrong to me.  I'd prefer to raise an exception for such mismatches, which also seems like a good way of protecting users from a whole class of bugs.

I'm also continuing to have a hard time accepting that different rules should exist (syntax and semantics) for column names and values. The general argument for SQL parity is a strong one, and I'm trying to be convinced on this issue, (honest), but I keep coming back to the notion that SQL column names are not typed, and that forcing that distinction on Cassandra seems contrived.

> term definitions
> ----------------
>
>                 Key: CASSANDRA-2027
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2027
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>    Affects Versions: 0.8
>            Reporter: Eric Evans
>            Assignee: Eric Evans
>            Priority: Minor
>              Labels: cql
>             Fix For: 0.8
>
>         Attachments: v1-0001-CASSANDRA-2027-utf8-and-integer-term-types.txt, v1-0002-column-name-validation.txt, v1-0003-system-tests-for-integer-and-utf8-term-types.txt, v1-0004-uuid-term-definitions.txt, v1-0005-missed-doc-update-for-utf8-term-type.txt
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> h3. String
> Anything between double-quotes.  Node-side this is just converted to bytes, so it could really be used to represent *any* type so long as it is appropriately encoded.
> Examples:
> {code:style=SQL}
> SELECT "name" FROM cf;
> UPDATE cf SET "name" = "value" WHERE KEY = "key";
> {code}
> h3. UTF-8
> A double-quoted string literal that is prefixed with a "u" to indicated that it should be encoded to bytes using the utf-8 charset node-side.
> Examples:
> {code:style=SQL}
> SELECT u"name" FROM cf;
> UPDATE cf SET u"name" = u"value" WHERE KEY = "key";
> {code}
> h3. Integer
> An undecorated numeric literal, converted to a 4-byte int node-side.
> Examples:
> {code:style=SQL}
> SELECT 10..100 FROM cf WHERE KEY = "key";
> UPDATE cf SET 1000 = "thousand", 100 = "hundred" WHERE KEY = "key";
> {code}
> h3. Long
> A numeric literal suffixed with an "L", converted to an 8-byte long node-side.
> Examples:
> {code:style=SQL}
> SELECT 10L..100L FROM cf WHERE KEY = "key";
> UPDATE cf SET 1000L = "thousand", 100L = "hundred" WHERE KEY = "key";
> {code}
> h3. UUID
> A string-formatted UUID supplied as an "argument" to a ctor/function formated string ({{uuid(<uuid string>)}}).  Node-side this is converted back to the corresponding UUID.
> Examples:
> {code:style=SQL}
> SELECT uuid(5f989e95-ae07-4425-b84a-6876ba106c66) FROM cf WHERE KEY = "key";
> UPDATE cf SET uuid(5621b93d-d3a2-4d22-8a59-bdb93202b6cb)  = "username" WHERE KEY = "key";
> {code}
> h3. TimeUUID (UUID Type 1)
> A string-formatted time-based UUID (type 1) supplied as an "argument" to a ctor/function formated string ({{timeuuid(<uuid string>)}}).  Node-side this is converted back to the corresponding UUID.  In addition to a string-formatted UUID, it should also be possible to supply dates in a variety of formats which will result in a new UUID being created node-side.
> Examples:
> {code:style=SQL}
> SELECT timeuuid(2011-01-01)..timeuuid(2010-01-21) FROM cf WHERE KEY = "key";
> UPDATE cf SET timeuuid(now) = 1000L  WHERE KEY = "key";
> {code}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira