You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2010/08/01 13:42:18 UTC

[jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings

    [ https://issues.apache.org/jira/browse/CASSANDRA-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894370#action_12894370 ] 

Uwe Schindler commented on CASSANDRA-767:
-----------------------------------------

About Lucandra: Currently all keys in Lucene are valid UTF-8 encoded bytes, so making them Strings in Cassandra is fine - also for numeric terms as Todd Nine said (they use only 7 bits of the byte[], so are valid UTF-8 - but there was still a bug in Cassandra by trimming keys, now solved).

Lucene trunk now has migrated to pure byte[] terms, so Lucandra will do the same. It is therefore no longer guaranteed that terms in an Lucene index are really representable as String, also the ordering of keys must be native unsigned byte[] and not UTF-16 (String.compareTo()) for several Queries in Lucene to work correct.

Additionally, the encoding of terms in Lucene trunk (aka 4.0) will also change to BOCU-1 for better space efficiency of eastern languages, also numeric terms will saved as raw byte[] with full 8bits, too.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>
>                 Key: CASSANDRA-767
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-767
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.7 beta 1
>
>         Attachments: 0001-Implement-compaction-benchmark.patch, 0002-Implement-a-legacy-sstable-test.patch, 0003-Store-bytes-in-DecoratedKey-and-cleanup-dead-code.patch, 0004-Extract-read-writeName.patch, 0005-Convert-IPartitioner-disk-key-format-to-bytes.patch, 0006-Bump-SSTable-version-to-c-remove-utf16-encoding-from.patch
>
>
> This issue has come up numerous times, and we've dealt with a lot of pain because of it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding binary data like integers as Strings is very inefficient, and there is a disconnect between our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes to row keys? If so, how do Partitioners change?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.