You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2010/02/05 08:50:27 UTC

[jira] Created: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Row keys should be byte[]s, not Strings
---------------------------------------

                 Key: CASSANDRA-767
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Stu Hood
            Priority: Critical
             Fix For: 0.7


This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.

Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.

The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841592#action_12841592 ] 

Stu Hood edited comment on CASSANDRA-767 at 3/4/10 11:11 PM:
-------------------------------------------------------------

> An alternative? Instead of AT defining a comparator, have it define a collation id generator
Works for me... it's similar to what we already do with BytesToken in COPP.

EDIT: Hmm, except we would still need to solve the padding problems within the collation id.
EDIT2: Unless the collation id supports the compound key concept somehow by having sections.

      was (Author: stuhood):
    > An alternative? Instead of AT defining a comparator, have it define a collation id generator
Works for me... it's similar to what we already do with BytesToken in COPP.

EDIT: Hmm, except we would still need to solve the padding problems within the collation id.
  
> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841615#action_12841615 ] 

Jonathan Ellis commented on CASSANDRA-767:
------------------------------------------

right, it's basically the decorate-sort-undecorate / schwartzian transform pattern.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841567#action_12841567 ] 

Jonathan Ellis commented on CASSANDRA-767:
------------------------------------------

that is still outside the scope of 767/16, btw, but i think it's more doable than a comparator-based approach.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841565#action_12841565 ] 

Jonathan Ellis commented on CASSANDRA-767:
------------------------------------------

then you get back to needing partitioner+token per CF.  that's totally outside the scope for 0.7 and probably 0.8.

An alternative?  Instead of AT defining a comparator, have it define a collation id generator, where it generates a byte[] that gives it "the right sort" when done lexicographically.  then you could make that part of key decoration and it Just Works.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841592#action_12841592 ] 

Stu Hood edited comment on CASSANDRA-767 at 3/4/10 11:04 PM:
-------------------------------------------------------------

> An alternative? Instead of AT defining a comparator, have it define a collation id generator
Works for me... it's similar to what we already do with BytesToken in COPP.

EDIT: Hmm, except we would still need to solve the padding problems within the collation id.

      was (Author: stuhood):
    > An alternative? Instead of AT defining a comparator, have it define a collation id generator
Works for me... it's similar to what we already do with BytesToken in COPP.
  
> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841559#action_12841559 ] 

Stu Hood edited comment on CASSANDRA-767 at 3/4/10 10:10 PM:
-------------------------------------------------------------

Based on the difficulties encountered here, and never wanting to run into them, I think we should allow AbstractTypes per ColumnFamily. This would allow us to do fun stuff like compound keys for our views without throwing away type information and needing to do nasty padding.

EDIT: er... here: http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/

      was (Author: stuhood):
    Based on the difficulties encountered here, and never wanting to run into them, I think we should allow AbstractTypes per ColumnFamily. This would allow us to do fun stuff like compound keys for our views without throwing away type information and needing to do nasty padding.
  
> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841592#action_12841592 ] 

Stu Hood commented on CASSANDRA-767:
------------------------------------

> An alternative? Instead of AT defining a comparator, have it define a collation id generator
Works for me... it's similar to what we already do with BytesToken in COPP.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830103#action_12830103 ] 

Jonathan Ellis commented on CASSANDRA-767:
------------------------------------------

nothing wrong with making everything the equivalent of BytesType to start with and adding support for others later if it turns out to be useful (imo: it won't)

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841559#action_12841559 ] 

Stu Hood commented on CASSANDRA-767:
------------------------------------

Based on the difficulties encountered here, and never wanting to run into them, I think we should allow AbstractTypes per ColumnFamily. This would allow us to do fun stuff like compound keys for our views without throwing away type information and needing to do nasty padding.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.7
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.