You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Folke Behrens (JIRA)" <ji...@apache.org> on 2010/07/01 02:49:50 UTC

[jira] Commented: (CASSANDRA-1232) UTF8Type.compare() is slow and dangerous

    [ https://issues.apache.org/jira/browse/CASSANDRA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884109#action_12884109 ] 

Folke Behrens commented on CASSANDRA-1232:
------------------------------------------

I think you guys need to clarify the purpose of AbstractType.getString(). The Javadoc states that it returns "a string representation of the bytes suitable for log messages." So it's meant for humans. Therefore throwing an exception because of invalid byte sequences is probably counter-intuitive.



> UTF8Type.compare() is slow and dangerous
> ----------------------------------------
>
>                 Key: CASSANDRA-1232
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1232
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Folke Behrens
>            Assignee: Nick Bailey
>             Fix For: 0.6.4
>
>         Attachments: 0001-Fixes-to-UTF8Type-compare-and-getString-methods.patch
>
>
> UTF8Type converts both byte arrays into Strings and then compares them. This is unnecessary and slow because UTF-8 encoded Strings are already directly comparable. Higher codepoints yield higher initial and subsequent bytes. One can safely use BytesType.compare() for UTF-8. Maybe UTF8Type should be a subclass only overriding getString().
> BTW, It's also dangerous to ignore invalid byte sequences. At this point the byte array should contain valid UTF-8.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.