You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2009/06/19 22:40:07 UTC

[jira] Created: (CASSANDRA-242) Implement method to "evenly" split a Range

Implement method to "evenly" split a Range
------------------------------------------

                 Key: CASSANDRA-242
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Stu Hood


Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.

This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.

Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747031#action_12747031 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

it's the COPP changes that make that related in my mind.  but I guess it's not that big a deal.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737746#action_12737746 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

> The byte[] returned by getCollatedBytes are equivalent to those returned by Collator.getCollationKey(string).toByteArray

Ah, which the jvm gives you.  Okay.

I think we have too many layers of abstraction still, but the approach is reasonable.

(Also, the collationkey docs imply that there is no need to convert the bytes to ints before comparison.)


> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>         Attachments: CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737398#action_12737398 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

Back when we discussed this in IRC, I think we concluded that although there is no way to split a Range, splitting a group of tokens and taking the median is almost as useful.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748355#action_12748355 ] 

Hudson commented on CASSANDRA-242:
----------------------------------

Integrated in Cassandra #179 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/179/])
    fix unit test.  patch by jbellis for 
add midpoint method to IPartitioner.  patch by Stu Hood; reviewed by jbellis for 
refactors COPP to use BytesToken.  patch by Stu Hood; reviewed by jbellis for 


> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242.diff

Alright, here is a new version of the patch that removes the layer of indirection that was in the previous patch.

CollatingOrderPreservingPartitioner now uses a BytesToken to explicitly indicate that the token isn't supposed to be human readable.

Thoughts? I think we need to get this merged before 0.4 goes out, since it makes a fundamental change to COPP.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet reopened CASSANDRA-242:
--------------------------------------


This commit in trunk on `ant test` is failing.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737723#action_12737723 ] 

Stu Hood commented on CASSANDRA-242:
------------------------------------

> But this means the only comparator you can use is the byte-order one.
The byte[] returned by getCollatedBytes are equivalent to those returned by Collator.getCollationKey(string).toByteArray, which makes comparing that byte[] to another byte[] generated by the same method the same as calling Collator.compare(string1, string2).

> You can't do Range comparisons with byte order, and then sort keys with the "real" comparator.
The byte[] comparison is the new "real" comparator, because it provides the same functionality that the older one did.

> That is why IMO the split code should just be left up to the Partitioner.
The split code hasn't been implemented yet, but it will definitely be added to the partitioner: a way to compare a byte[] with no (real) string representation to a String is one of the prerequisites (unless this looks too ugly, and 3) looks better).

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>         Attachments: CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242.diff

Fixed to apply against current trunk (getInitialToken -> getToken): no other changes.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747027#action_12747027 ] 

Stu Hood commented on CASSANDRA-242:
------------------------------------

> well, I've said that this is a fairly large patch to review all-at-once, and suggested one
> way to break it into pieces. was that a bad suggestion?
Hmm, I actually thought you were joking. I'll split it.

> at a high level the way it makes sense in my mind is to make the refactor of decoration
> to (Token, Key) first, and then add this as a decoratedkey implementation later. 
As I mentioned in https://issues.apache.org/jira/browse/CASSANDRA-242?focusedCommentId=12746188&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12746188 I don't see how this ticket is at all related to the decoratedKey refactor.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737719#action_12737719 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

Maybe I am slow -- I don't see what problem this is solving.

It looks like the reasoning is
 - we can't just convert all Tokens to byte[] and average them to get a new midpoint Token, because you are not guaranteed to be able to invert the averaged byte[] to a valid Token
 - so let's do all Token comparisons as byte[] and not try  to do the inversion

But this means the only comparator you can use is the byte-order one.  You can't do Range comparisons with byte order, and then sort keys with the "real" comparator.  Bootstrap and cross-node range queries both rely on using the same comparator with Tokens as with keys.  That is why IMO the split code should just be left up to the Partitioner.

Have I misunderstood?

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>         Attachments: CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242.diff

Updated storage-conf.xml to apply to trunk.

Any feedback? I'd really like to get this change committed before 0.4, and before anyone starts using COPP.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242_part-2.diff
                CASSANDRA-242_part-1.diff

Part 1 of the patch refactors COPP to use BytesToken.
Part 2 of the patch adds the midpoint() method and tests.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737441#action_12737441 ] 

Stu Hood commented on CASSANDRA-242:
------------------------------------

My current thoughts are that we either:
 1. Add a BytewiseToken, which can still be compared to StringToken via a common interface. We can fairly easily generate a byte array that falls halfway between the CollationKey.toByteArray of two StringTokens, but we can't make it back into a String without understanding the JVM we are running in.
 2. Remove the Collator from OPP entirely in favor of byte order and letting someone who values lexical sort implement it.

Any other ideas?

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737589#action_12737589 ] 

Stu Hood commented on CASSANDRA-242:
------------------------------------

After posting 1), I thought about it some more and realized it would be a relatively smooth change: I'm leaning toward that approach.

Rather than an interface, we could make StringToken an abstract class with two concrete classes: a "CollatedStringToken", holding only a byte[] representing the CollationKey of its String and a RawStringToken, which holds a String and is equivalent to the current StringToken. The StringTokens would be compared via an abstract getCollatedBytes() method.

Names subject to change, etc.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood reassigned CASSANDRA-242:
----------------------------------

    Assignee: Stu Hood

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747998#action_12747998 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

Thanks for splitting the patch.  Looks like a good approach.

Patch 1: let's use an annotation instead of special interface to indicate order-preserving-ness
Both, but especially patch 2: please follow http://wiki.apache.org/cassandra/CodeStyle


> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748026#action_12748026 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

> Patch 1: let's use an annotation instead of special interface to indicate order-preserving-ness 

Or even simpler just have a boolean isOrdered method in IPartitioner.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242.diff

The previous patch didn't test the case where a BytesToken wasn't encodable as UTF-8, which would have broken. In this version, COPP.TokenFactory encodes Tokens as hex when they need to be displayed as Strings (which isn't often: when someone manually configures their initialToken or updates the token for a node).

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748247#action_12748247 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

fixed like this:

-        Range r = new Range(new StringToken("0"), new StringToken("zzzzzz"));                                       
+        Range r = new Range(CollatingOrderPreservingPartitioner.MINIMUM, new BytesToken("zzzzzz".getBytes()));      

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747008#action_12747008 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

at a high level the way it makes sense in my mind is to make the refactor of decoration to (Token, Key) first, and then add this as a decoratedkey implementation later.

i don't mind dropping COPP for 0.4 if that's a stumbling block.  we can check on the ML to be sure but i'm 90% sure nobody uses it.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737543#action_12737543 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

I am fine with 2) as long as the range split method is part of Partitioner too so it's easily overridable in one place.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242.diff

This patch implements 1) via a 'Collatable' abstract class, with Collatable.String and Collatable.Bytes.

The main wart is that a Collatable.Bytes object will throw a RuntimeException if you attempt to use it asString(), but it might still be a reasonable approach for plugging in a Token that is never meant to be serialized or decorated.

A third option would be to 3) Add an explicit third state for Tokens: plain, decorated and -binary-. Once a token is in a binary state, it cannot be converted back to plain or decorated. The explicitness is key.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>         Attachments: CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-242:
-------------------------------

    Attachment: CASSANDRA-242_part-2.diff
                CASSANDRA-242_part-1.diff

The latest versions should get all of the coding style fixes, and include a boolean preservesOrder flag for IPartitioners.

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747004#action_12747004 ] 

Jonathan Ellis commented on CASSANDRA-242:
------------------------------------------

well, I've said that this is a fairly large patch to review all-at-once, and suggested one way to break it into pieces.  was that a bad suggestion?

(i'm sure it's clear to you, but to put it in perspective, this is not much smaller than the first Big Patch for CASSANDRA-342, which I think you'll agree benefited a lot from being split up)

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-242) Implement method to "evenly" split a Range

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-242.
--------------------------------------

    Resolution: Fixed

> Implement method to "evenly" split a Range
> ------------------------------------------
>
>                 Key: CASSANDRA-242
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-242
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>             Fix For: 0.4
>
>         Attachments: CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-1.diff, CASSANDRA-242_part-2.diff, CASSANDRA-242_part-2.diff
>
>
> Two tickets currently depend on being able to deterministically split a Range object into two "even" Ranges.
> This can be accomplished with RandomPartitioner/BigIntegerToken by taking the average of the tokens, but the OrderPreservingPartitioner/StringToken implementation uses a Java Collator to define the sort order of Tokens, which means that they are not necessarily sorted in byte/char order.
> Collator.getCollationKey(String).toByteArray() gets you a sortable byte array, but there is no publicly accessible API for converting a similar byte array back into a String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.