You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Guillermo Winkler <gw...@inconcertcc.com> on 2010/12/06 17:11:14 UTC

Sorting problem on supercolumns names using OPP on 0.6.2

Hi, I've the following schema defined:

EventsByUserDate : {
UserId : {
epoch: { // SC
 IID,
IID,
IID,
 IID
},
// and the other events in time
 epoch: {
IID,
IID,
 IID
}
}
}
<ColumnFamily ColumnType="Super" CompareWith="LongType"
CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>

Where I'm expecting to store all the event ids for a user ordered by date
(it's seconds since epoch as long long), I'm using
OrdingPreservingPartitioner.

But a call to:

GetSuperRangeSlices("EventsByUserDate ",  --column family
"",  --supercolumn
 userId, --startkey
userId, --endkey
 {
    column_names = {},
   slice_range = {
     start = "",
      finish = "",
     reversed = true,
                                     count = 20} },
                                1 --total keys
               )

Is not sorting correctly by supercolumn (the supercolumn names come out
unsorted), this is a sample output for the pervious query using thrift
directly:

SC 1291648883
SC 1291588465
SC 1291588453
SC 1291586385
SC 1291587408
SC 1291588174
SC 1291585331
SC 1291587116
SC 1291651116
SC 1291586332
SC 1291588548
SC 1291588036
SC 1291648703
SC 1291583651
SC 1291583650
SC 1291583649
SC 1291583648
SC 1291583647
SC 1291583646
SC 1291587485


Anything I'm missing regarding sorting schemes?

Thanks,
Guille

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by David Replogle <da...@steketeegreiner.com>.

+1

I'm doing this in my C++ client so contact me offlist if you need code

David
Sent from my iPhone

On Dec 6, 2010, at 1:33 PM, Tyler Hobbs <ty...@riptano.com> wrote:

> Also, thought I should mention:
> 
> When you make a std::string out of the char[], make sure to use the constructor with the size_t parameter (size 8).
> 
> - Tyler
> 
> On Mon, Dec 6, 2010 at 12:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:
> That should be "big-endian".
> 
> 
> On Mon, Dec 6, 2010 at 12:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:
> How are you packing the longs into strings?  The large negative numbers point to that being done incorrectly.
> 
> Bitshifting and putting each byte of the long into a char[8] then stringifying the char[] is the best way to go.  Cassandra expects
> big-ending longs, as well.
> 
> - Tyler
> 
> 
> On Mon, Dec 6, 2010 at 11:55 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:
> I'm using thrift in C++ and inserting the results in a vector of pairs, so client-side-mangling does not seem to be the problem.
> 
> Also I'm using a "test" column where I insert the same value I'm using as super column name (in this case the same date converted to string) and when queried using cassandra cli is unsorted too:
> 
> cassandra> get Events.EventsByUserDate ['guille']
> => (super_column=9088542550893002752,
>      (column=4342323443303834363833383437454339364433324530324538413039373736, value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
> => (super_column=5990347482238812160,
>      (column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d, value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
> => (super_column=-3089190841516818432,
>      (column=3634343644353236463830303437363542454245354630343845393533373337, value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
> => (super_column=-4026221038986592256,
>      (column=62303232396330372d636430612d343662332d623834382d393632366136323061376532, value=2010-12-06 17:39:50.000, timestamp=1291657190117981))
> 
> 
> 
> 
> On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:
> What client are you using?  Is it storing the results in a hash map or some other type of
> non-order preserving dictionary?
> 
> - Tyler
> 
> 
> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <gw...@inconcertcc.com> wrote:
> Hi, I've the following schema defined:
> 
> EventsByUserDate : { 
> 					UserId : {
> 						epoch: { // SC
> 							IID, 
> 							IID,
> 							IID,
> 							IID
> 						},
> 						// and the other events in time
> 						epoch: {
> 							IID,
> 							IID,
> 							IID
> 						}
> 					}
> }
> <ColumnFamily ColumnType="Super" CompareWith="LongType" CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
> 
> Where I'm expecting to store all the event ids for a user ordered by date (it's seconds since epoch as long long), I'm using OrdingPreservingPartitioner.
> 
> But a call to:
> 
> GetSuperRangeSlices("EventsByUserDate ",  --column family
> 				"",  --supercolumn
> 				userId, --startkey
> 				userId, --endkey
> 				 { 
> 				   column_names = {},
> 				   slice_range = { 
> 				     start = "", 
> 				     finish = "", 
> 				     reversed = true, 
>                                      count = 20}							},
>                                 1 --total keys
> 		               ) 
> 
> Is not sorting correctly by supercolumn (the supercolumn names come out unsorted), this is a sample output for the pervious query using thrift directly:
> 
> SC 1291648883
> SC 1291588465
> SC 1291588453
> SC 1291586385
> SC 1291587408
> SC 1291588174
> SC 1291585331
> SC 1291587116
> SC 1291651116
> SC 1291586332
> SC 1291588548
> SC 1291588036
> SC 1291648703
> SC 1291583651
> SC 1291583650
> SC 1291583649
> SC 1291583648
> SC 1291583647
> SC 1291583646
> SC 1291587485
> 
> 
> Anything I'm missing regarding sorting schemes? 
> 
> Thanks,
> Guille
> 
> 
> 
> 
> 
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Tyler Hobbs <ty...@riptano.com>.

Also, thought I should mention:

When you make a std::string out of the char[], make sure to use the
constructor with the size_t parameter (size 8).

- Tyler

On Mon, Dec 6, 2010 at 12:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:

> That should be "big-endian".
>
>
> On Mon, Dec 6, 2010 at 12:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>
>> How are you packing the longs into strings?  The large negative numbers
>> point to that being done incorrectly.
>>
>> Bitshifting and putting each byte of the long into a char[8] then
>> stringifying the char[] is the best way to go.  Cassandra expects
>> big-ending longs, as well.
>>
>> - Tyler
>>
>>
>> On Mon, Dec 6, 2010 at 11:55 AM, Guillermo Winkler <
>> gwinkler@inconcertcc.com> wrote:
>>
>>> I'm using thrift in C++ and inserting the results in a vector of pairs,
>>> so client-side-mangling does not seem to be the problem.
>>>
>>> Also I'm using a "test" column where I insert the same value I'm using as
>>> super column name (in this case the same date converted to string) and when
>>> queried using cassandra cli is unsorted too:
>>>
>>> cassandra> get Events.EventsByUserDate ['guille']
>>> => (super_column=9088542550893002752,
>>>
>>> (column=4342323443303834363833383437454339364433324530324538413039373736,
>>> value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
>>> => (super_column=5990347482238812160,
>>>
>>> (column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d,
>>> value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
>>> => (super_column=-3089190841516818432,
>>>
>>> (column=3634343644353236463830303437363542454245354630343845393533373337,
>>> value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
>>> => (super_column=-4026221038986592256,
>>>
>>> (column=62303232396330372d636430612d343662332d623834382d393632366136323061376532,
>>> value=2010-12-06 17:39:50.000, timestamp=1291657190117981))
>>>
>>>
>>>
>>>
>>> On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>>>
>>>> What client are you using?  Is it storing the results in a hash map or
>>>> some other type of
>>>> non-order preserving dictionary?
>>>>
>>>> - Tyler
>>>>
>>>>
>>>> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <
>>>> gwinkler@inconcertcc.com> wrote:
>>>>
>>>>> Hi, I've the following schema defined:
>>>>>
>>>>> EventsByUserDate : {
>>>>>  UserId : {
>>>>> epoch: { // SC
>>>>>  IID,
>>>>> IID,
>>>>> IID,
>>>>>  IID
>>>>> },
>>>>> // and the other events in time
>>>>>  epoch: {
>>>>> IID,
>>>>> IID,
>>>>>  IID
>>>>> }
>>>>> }
>>>>> }
>>>>> <ColumnFamily ColumnType="Super" CompareWith="LongType"
>>>>> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>>>>>
>>>>> Where I'm expecting to store all the event ids for a user ordered by
>>>>> date (it's seconds since epoch as long long), I'm using
>>>>> OrdingPreservingPartitioner.
>>>>>
>>>>> But a call to:
>>>>>
>>>>> GetSuperRangeSlices("EventsByUserDate ",  --column family
>>>>> "",  --supercolumn
>>>>>  userId, --startkey
>>>>> userId, --endkey
>>>>>  {
>>>>>     column_names = {},
>>>>>    slice_range = {
>>>>>      start = "",
>>>>>       finish = "",
>>>>>      reversed = true,
>>>>>                                      count = 20} },
>>>>>                                 1 --total keys
>>>>>                )
>>>>>
>>>>> Is not sorting correctly by supercolumn (the supercolumn names come out
>>>>> unsorted), this is a sample output for the pervious query using thrift
>>>>> directly:
>>>>>
>>>>> SC 1291648883
>>>>> SC 1291588465
>>>>> SC 1291588453
>>>>> SC 1291586385
>>>>> SC 1291587408
>>>>> SC 1291588174
>>>>> SC 1291585331
>>>>> SC 1291587116
>>>>> SC 1291651116
>>>>> SC 1291586332
>>>>> SC 1291588548
>>>>> SC 1291588036
>>>>> SC 1291648703
>>>>> SC 1291583651
>>>>> SC 1291583650
>>>>> SC 1291583649
>>>>> SC 1291583648
>>>>> SC 1291583647
>>>>> SC 1291583646
>>>>> SC 1291587485
>>>>>
>>>>>
>>>>> Anything I'm missing regarding sorting schemes?
>>>>>
>>>>> Thanks,
>>>>> Guille
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Tyler Hobbs <ty...@riptano.com>.

That should be "big-endian".

On Mon, Dec 6, 2010 at 12:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:

> How are you packing the longs into strings?  The large negative numbers
> point to that being done incorrectly.
>
> Bitshifting and putting each byte of the long into a char[8] then
> stringifying the char[] is the best way to go.  Cassandra expects
> big-ending longs, as well.
>
> - Tyler
>
>
> On Mon, Dec 6, 2010 at 11:55 AM, Guillermo Winkler <
> gwinkler@inconcertcc.com> wrote:
>
>> I'm using thrift in C++ and inserting the results in a vector of pairs, so
>> client-side-mangling does not seem to be the problem.
>>
>> Also I'm using a "test" column where I insert the same value I'm using as
>> super column name (in this case the same date converted to string) and when
>> queried using cassandra cli is unsorted too:
>>
>> cassandra> get Events.EventsByUserDate ['guille']
>> => (super_column=9088542550893002752,
>>
>> (column=4342323443303834363833383437454339364433324530324538413039373736,
>> value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
>> => (super_column=5990347482238812160,
>>
>> (column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d,
>> value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
>> => (super_column=-3089190841516818432,
>>
>> (column=3634343644353236463830303437363542454245354630343845393533373337,
>> value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
>> => (super_column=-4026221038986592256,
>>
>> (column=62303232396330372d636430612d343662332d623834382d393632366136323061376532,
>> value=2010-12-06 17:39:50.000, timestamp=1291657190117981))
>>
>>
>>
>>
>> On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>>
>>> What client are you using?  Is it storing the results in a hash map or
>>> some other type of
>>> non-order preserving dictionary?
>>>
>>> - Tyler
>>>
>>>
>>> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <
>>> gwinkler@inconcertcc.com> wrote:
>>>
>>>> Hi, I've the following schema defined:
>>>>
>>>> EventsByUserDate : {
>>>>  UserId : {
>>>> epoch: { // SC
>>>>  IID,
>>>> IID,
>>>> IID,
>>>>  IID
>>>> },
>>>> // and the other events in time
>>>>  epoch: {
>>>> IID,
>>>> IID,
>>>>  IID
>>>> }
>>>> }
>>>> }
>>>> <ColumnFamily ColumnType="Super" CompareWith="LongType"
>>>> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>>>>
>>>> Where I'm expecting to store all the event ids for a user ordered by
>>>> date (it's seconds since epoch as long long), I'm using
>>>> OrdingPreservingPartitioner.
>>>>
>>>> But a call to:
>>>>
>>>> GetSuperRangeSlices("EventsByUserDate ",  --column family
>>>> "",  --supercolumn
>>>>  userId, --startkey
>>>> userId, --endkey
>>>>  {
>>>>     column_names = {},
>>>>    slice_range = {
>>>>      start = "",
>>>>       finish = "",
>>>>      reversed = true,
>>>>                                      count = 20} },
>>>>                                 1 --total keys
>>>>                )
>>>>
>>>> Is not sorting correctly by supercolumn (the supercolumn names come out
>>>> unsorted), this is a sample output for the pervious query using thrift
>>>> directly:
>>>>
>>>> SC 1291648883
>>>> SC 1291588465
>>>> SC 1291588453
>>>> SC 1291586385
>>>> SC 1291587408
>>>> SC 1291588174
>>>> SC 1291585331
>>>> SC 1291587116
>>>> SC 1291651116
>>>> SC 1291586332
>>>> SC 1291588548
>>>> SC 1291588036
>>>> SC 1291648703
>>>> SC 1291583651
>>>> SC 1291583650
>>>> SC 1291583649
>>>> SC 1291583648
>>>> SC 1291583647
>>>> SC 1291583646
>>>> SC 1291587485
>>>>
>>>>
>>>> Anything I'm missing regarding sorting schemes?
>>>>
>>>> Thanks,
>>>> Guille
>>>>
>>>>
>>>
>>
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Guillermo Winkler <gw...@inconcertcc.com>.

okkkk now it's behaving :)

#define ntohll(x) (((_int64)(ntohl((int)((x << 32) >> 32))) << 32) |
(unsigned int)ntohl(((int)(x >> 32))))

string result;
result.resize(sizeof(long long));
long long bigendian = htonll(l);
memcpy(&result[0], &bigendian, sizeof(long long));

=> (super_column=1291668233,

(column=35646130653133632d333766642d343231312d386138382d393936383966326462643364,
value=2010-12-06 20:43:53.000, timestamp=1291668233034754)

(column=61323432323262622d353734342d346133322d393530312d626238343365346363376335,
value=2010-12-06 20:43:53.000, timestamp=1291668233169771)

(column=66633136333166382d373733622d343734652d393265362d376162633364316564383964,
value=2010-12-06 20:43:53.000, timestamp=1291668233302288))
=> (super_column=1291668232,

(column=61343765353432352d613066392d343334392d613761392d336635313631633261303161,
value=2010-12-06 20:43:52.000, timestamp=1291668232563694)

(column=64343635396433382d316166302d343732662d623737392d336634303931323961373364,
value=2010-12-06 20:43:52.000, timestamp=1291668232889235))

Thanks again!
Guille




On Mon, Dec 6, 2010 at 5:45 PM, Guillermo Winkler
<gw...@inconcertcc.com>wrote:

> uh, ok I was just copying :P
>
>         string result;
> result.resize(sizeof(long long));
>  memcpy(&result[0], &l, sizeof(long long));
>
> I'll try and let you know
>
> many thanks!
>
>
> On Mon, Dec 6, 2010 at 4:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>
>> How are you packing the longs into strings?  The large negative numbers
>> point to that being done incorrectly.
>>
>> Bitshifting and putting each byte of the long into a char[8] then
>> stringifying the char[] is the best way to go.  Cassandra expects
>> big-ending longs, as well.
>>
>> - Tyler
>>
>>
>> On Mon, Dec 6, 2010 at 11:55 AM, Guillermo Winkler <
>> gwinkler@inconcertcc.com> wrote:
>>
>>> I'm using thrift in C++ and inserting the results in a vector of pairs,
>>> so client-side-mangling does not seem to be the problem.
>>>
>>> Also I'm using a "test" column where I insert the same value I'm using as
>>> super column name (in this case the same date converted to string) and when
>>> queried using cassandra cli is unsorted too:
>>>
>>> cassandra> get Events.EventsByUserDate ['guille']
>>> => (super_column=9088542550893002752,
>>>
>>> (column=4342323443303834363833383437454339364433324530324538413039373736,
>>> value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
>>> => (super_column=5990347482238812160,
>>>
>>> (column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d,
>>> value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
>>> => (super_column=-3089190841516818432,
>>>
>>> (column=3634343644353236463830303437363542454245354630343845393533373337,
>>> value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
>>> => (super_column=-4026221038986592256,
>>>
>>> (column=62303232396330372d636430612d343662332d623834382d393632366136323061376532,
>>> value=2010-12-06 17:39:50.000, timestamp=1291657190117981))
>>>
>>>
>>>
>>>
>>> On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>>>
>>>> What client are you using?  Is it storing the results in a hash map or
>>>> some other type of
>>>> non-order preserving dictionary?
>>>>
>>>> - Tyler
>>>>
>>>>
>>>> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <
>>>> gwinkler@inconcertcc.com> wrote:
>>>>
>>>>> Hi, I've the following schema defined:
>>>>>
>>>>> EventsByUserDate : {
>>>>>  UserId : {
>>>>> epoch: { // SC
>>>>>  IID,
>>>>> IID,
>>>>> IID,
>>>>>  IID
>>>>> },
>>>>> // and the other events in time
>>>>>  epoch: {
>>>>> IID,
>>>>> IID,
>>>>>  IID
>>>>> }
>>>>> }
>>>>> }
>>>>> <ColumnFamily ColumnType="Super" CompareWith="LongType"
>>>>> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>>>>>
>>>>> Where I'm expecting to store all the event ids for a user ordered by
>>>>> date (it's seconds since epoch as long long), I'm using
>>>>> OrdingPreservingPartitioner.
>>>>>
>>>>> But a call to:
>>>>>
>>>>> GetSuperRangeSlices("EventsByUserDate ",  --column family
>>>>> "",  --supercolumn
>>>>>  userId, --startkey
>>>>> userId, --endkey
>>>>>  {
>>>>>     column_names = {},
>>>>>    slice_range = {
>>>>>      start = "",
>>>>>       finish = "",
>>>>>      reversed = true,
>>>>>                                      count = 20} },
>>>>>                                 1 --total keys
>>>>>                )
>>>>>
>>>>> Is not sorting correctly by supercolumn (the supercolumn names come out
>>>>> unsorted), this is a sample output for the pervious query using thrift
>>>>> directly:
>>>>>
>>>>> SC 1291648883
>>>>> SC 1291588465
>>>>> SC 1291588453
>>>>> SC 1291586385
>>>>> SC 1291587408
>>>>> SC 1291588174
>>>>> SC 1291585331
>>>>> SC 1291587116
>>>>> SC 1291651116
>>>>> SC 1291586332
>>>>> SC 1291588548
>>>>> SC 1291588036
>>>>> SC 1291648703
>>>>> SC 1291583651
>>>>> SC 1291583650
>>>>> SC 1291583649
>>>>> SC 1291583648
>>>>> SC 1291583647
>>>>> SC 1291583646
>>>>> SC 1291587485
>>>>>
>>>>>
>>>>> Anything I'm missing regarding sorting schemes?
>>>>>
>>>>> Thanks,
>>>>> Guille
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Guillermo Winkler <gw...@inconcertcc.com>.

uh, ok I was just copying :P

        string result;
result.resize(sizeof(long long));
memcpy(&result[0], &l, sizeof(long long));

I'll try and let you know

many thanks!


On Mon, Dec 6, 2010 at 4:29 PM, Tyler Hobbs <ty...@riptano.com> wrote:

> How are you packing the longs into strings?  The large negative numbers
> point to that being done incorrectly.
>
> Bitshifting and putting each byte of the long into a char[8] then
> stringifying the char[] is the best way to go.  Cassandra expects
> big-ending longs, as well.
>
> - Tyler
>
>
> On Mon, Dec 6, 2010 at 11:55 AM, Guillermo Winkler <
> gwinkler@inconcertcc.com> wrote:
>
>> I'm using thrift in C++ and inserting the results in a vector of pairs, so
>> client-side-mangling does not seem to be the problem.
>>
>> Also I'm using a "test" column where I insert the same value I'm using as
>> super column name (in this case the same date converted to string) and when
>> queried using cassandra cli is unsorted too:
>>
>> cassandra> get Events.EventsByUserDate ['guille']
>> => (super_column=9088542550893002752,
>>
>> (column=4342323443303834363833383437454339364433324530324538413039373736,
>> value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
>> => (super_column=5990347482238812160,
>>
>> (column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d,
>> value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
>> => (super_column=-3089190841516818432,
>>
>> (column=3634343644353236463830303437363542454245354630343845393533373337,
>> value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
>> => (super_column=-4026221038986592256,
>>
>> (column=62303232396330372d636430612d343662332d623834382d393632366136323061376532,
>> value=2010-12-06 17:39:50.000, timestamp=1291657190117981))
>>
>>
>>
>>
>> On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>>
>>> What client are you using?  Is it storing the results in a hash map or
>>> some other type of
>>> non-order preserving dictionary?
>>>
>>> - Tyler
>>>
>>>
>>> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <
>>> gwinkler@inconcertcc.com> wrote:
>>>
>>>> Hi, I've the following schema defined:
>>>>
>>>> EventsByUserDate : {
>>>>  UserId : {
>>>> epoch: { // SC
>>>>  IID,
>>>> IID,
>>>> IID,
>>>>  IID
>>>> },
>>>> // and the other events in time
>>>>  epoch: {
>>>> IID,
>>>> IID,
>>>>  IID
>>>> }
>>>> }
>>>> }
>>>> <ColumnFamily ColumnType="Super" CompareWith="LongType"
>>>> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>>>>
>>>> Where I'm expecting to store all the event ids for a user ordered by
>>>> date (it's seconds since epoch as long long), I'm using
>>>> OrdingPreservingPartitioner.
>>>>
>>>> But a call to:
>>>>
>>>> GetSuperRangeSlices("EventsByUserDate ",  --column family
>>>> "",  --supercolumn
>>>>  userId, --startkey
>>>> userId, --endkey
>>>>  {
>>>>     column_names = {},
>>>>    slice_range = {
>>>>      start = "",
>>>>       finish = "",
>>>>      reversed = true,
>>>>                                      count = 20} },
>>>>                                 1 --total keys
>>>>                )
>>>>
>>>> Is not sorting correctly by supercolumn (the supercolumn names come out
>>>> unsorted), this is a sample output for the pervious query using thrift
>>>> directly:
>>>>
>>>> SC 1291648883
>>>> SC 1291588465
>>>> SC 1291588453
>>>> SC 1291586385
>>>> SC 1291587408
>>>> SC 1291588174
>>>> SC 1291585331
>>>> SC 1291587116
>>>> SC 1291651116
>>>> SC 1291586332
>>>> SC 1291588548
>>>> SC 1291588036
>>>> SC 1291648703
>>>> SC 1291583651
>>>> SC 1291583650
>>>> SC 1291583649
>>>> SC 1291583648
>>>> SC 1291583647
>>>> SC 1291583646
>>>> SC 1291587485
>>>>
>>>>
>>>> Anything I'm missing regarding sorting schemes?
>>>>
>>>> Thanks,
>>>> Guille
>>>>
>>>>
>>>
>>
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Tyler Hobbs <ty...@riptano.com>.

How are you packing the longs into strings?  The large negative numbers
point to that being done incorrectly.

Bitshifting and putting each byte of the long into a char[8] then
stringifying the char[] is the best way to go.  Cassandra expects
big-ending longs, as well.

- Tyler

On Mon, Dec 6, 2010 at 11:55 AM, Guillermo Winkler <gwinkler@inconcertcc.com
> wrote:

> I'm using thrift in C++ and inserting the results in a vector of pairs, so
> client-side-mangling does not seem to be the problem.
>
> Also I'm using a "test" column where I insert the same value I'm using as
> super column name (in this case the same date converted to string) and when
> queried using cassandra cli is unsorted too:
>
> cassandra> get Events.EventsByUserDate ['guille']
> => (super_column=9088542550893002752,
>
> (column=4342323443303834363833383437454339364433324530324538413039373736,
> value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
> => (super_column=5990347482238812160,
>
> (column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d,
> value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
> => (super_column=-3089190841516818432,
>
> (column=3634343644353236463830303437363542454245354630343845393533373337,
> value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
> => (super_column=-4026221038986592256,
>
> (column=62303232396330372d636430612d343662332d623834382d393632366136323061376532,
> value=2010-12-06 17:39:50.000, timestamp=1291657190117981))
>
>
>
>
> On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:
>
>> What client are you using?  Is it storing the results in a hash map or
>> some other type of
>> non-order preserving dictionary?
>>
>> - Tyler
>>
>>
>> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <
>> gwinkler@inconcertcc.com> wrote:
>>
>>> Hi, I've the following schema defined:
>>>
>>> EventsByUserDate : {
>>>  UserId : {
>>> epoch: { // SC
>>>  IID,
>>> IID,
>>> IID,
>>>  IID
>>> },
>>> // and the other events in time
>>>  epoch: {
>>> IID,
>>> IID,
>>>  IID
>>> }
>>> }
>>> }
>>> <ColumnFamily ColumnType="Super" CompareWith="LongType"
>>> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>>>
>>> Where I'm expecting to store all the event ids for a user ordered by date
>>> (it's seconds since epoch as long long), I'm using
>>> OrdingPreservingPartitioner.
>>>
>>> But a call to:
>>>
>>> GetSuperRangeSlices("EventsByUserDate ",  --column family
>>> "",  --supercolumn
>>>  userId, --startkey
>>> userId, --endkey
>>>  {
>>>     column_names = {},
>>>    slice_range = {
>>>      start = "",
>>>       finish = "",
>>>      reversed = true,
>>>                                      count = 20} },
>>>                                 1 --total keys
>>>                )
>>>
>>> Is not sorting correctly by supercolumn (the supercolumn names come out
>>> unsorted), this is a sample output for the pervious query using thrift
>>> directly:
>>>
>>> SC 1291648883
>>> SC 1291588465
>>> SC 1291588453
>>> SC 1291586385
>>> SC 1291587408
>>> SC 1291588174
>>> SC 1291585331
>>> SC 1291587116
>>> SC 1291651116
>>> SC 1291586332
>>> SC 1291588548
>>> SC 1291588036
>>> SC 1291648703
>>> SC 1291583651
>>> SC 1291583650
>>> SC 1291583649
>>> SC 1291583648
>>> SC 1291583647
>>> SC 1291583646
>>> SC 1291587485
>>>
>>>
>>> Anything I'm missing regarding sorting schemes?
>>>
>>> Thanks,
>>> Guille
>>>
>>>
>>
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Guillermo Winkler <gw...@inconcertcc.com>.

I'm using thrift in C++ and inserting the results in a vector of pairs, so
client-side-mangling does not seem to be the problem.

Also I'm using a "test" column where I insert the same value I'm using as
super column name (in this case the same date converted to string) and when
queried using cassandra cli is unsorted too:

cassandra> get Events.EventsByUserDate ['guille']
=> (super_column=9088542550893002752,

(column=4342323443303834363833383437454339364433324530324538413039373736,
value=2010-12-06 17:43:36.000, timestamp=1291657416526732))
=> (super_column=5990347482238812160,

(column=41414e4c6b54696d6532423656566e6869667a336f654b6147393d2d395a4e797441397a744f39686d3147392b406d61696c2e676d61696c2e636f6d,
value=2010-12-06 17:46:08.000, timestamp=1291657568569039))
=> (super_column=-3089190841516818432,

(column=3634343644353236463830303437363542454245354630343845393533373337,
value=2010-12-06 17:44:47.000, timestamp=1291657487450738))
=> (super_column=-4026221038986592256,

(column=62303232396330372d636430612d343662332d623834382d393632366136323061376532,
value=2010-12-06 17:39:50.000, timestamp=1291657190117981))




On Mon, Dec 6, 2010 at 3:02 PM, Tyler Hobbs <ty...@riptano.com> wrote:

> What client are you using?  Is it storing the results in a hash map or some
> other type of
> non-order preserving dictionary?
>
> - Tyler
>
>
> On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <
> gwinkler@inconcertcc.com> wrote:
>
>> Hi, I've the following schema defined:
>>
>> EventsByUserDate : {
>>  UserId : {
>> epoch: { // SC
>>  IID,
>> IID,
>> IID,
>>  IID
>> },
>> // and the other events in time
>>  epoch: {
>> IID,
>> IID,
>>  IID
>> }
>> }
>> }
>> <ColumnFamily ColumnType="Super" CompareWith="LongType"
>> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>>
>> Where I'm expecting to store all the event ids for a user ordered by date
>> (it's seconds since epoch as long long), I'm using
>> OrdingPreservingPartitioner.
>>
>> But a call to:
>>
>> GetSuperRangeSlices("EventsByUserDate ",  --column family
>> "",  --supercolumn
>>  userId, --startkey
>> userId, --endkey
>>  {
>>     column_names = {},
>>    slice_range = {
>>      start = "",
>>       finish = "",
>>      reversed = true,
>>                                      count = 20} },
>>                                 1 --total keys
>>                )
>>
>> Is not sorting correctly by supercolumn (the supercolumn names come out
>> unsorted), this is a sample output for the pervious query using thrift
>> directly:
>>
>> SC 1291648883
>> SC 1291588465
>> SC 1291588453
>> SC 1291586385
>> SC 1291587408
>> SC 1291588174
>> SC 1291585331
>> SC 1291587116
>> SC 1291651116
>> SC 1291586332
>> SC 1291588548
>> SC 1291588036
>> SC 1291648703
>> SC 1291583651
>> SC 1291583650
>> SC 1291583649
>> SC 1291583648
>> SC 1291583647
>> SC 1291583646
>> SC 1291587485
>>
>>
>> Anything I'm missing regarding sorting schemes?
>>
>> Thanks,
>> Guille
>>
>>
>

Re: Sorting problem on supercolumns names using OPP on 0.6.2

Posted by Tyler Hobbs <ty...@riptano.com>.

What client are you using?  Is it storing the results in a hash map or some
other type of
non-order preserving dictionary?

- Tyler

On Mon, Dec 6, 2010 at 10:11 AM, Guillermo Winkler <gwinkler@inconcertcc.com
> wrote:

> Hi, I've the following schema defined:
>
> EventsByUserDate : {
> UserId : {
> epoch: { // SC
>  IID,
> IID,
> IID,
>  IID
> },
> // and the other events in time
>  epoch: {
> IID,
> IID,
>  IID
> }
> }
> }
> <ColumnFamily ColumnType="Super" CompareWith="LongType"
> CompareSubcolumnsWith="BytesType" Name="EventsByUserDate "/>
>
> Where I'm expecting to store all the event ids for a user ordered by date
> (it's seconds since epoch as long long), I'm using
> OrdingPreservingPartitioner.
>
> But a call to:
>
> GetSuperRangeSlices("EventsByUserDate ",  --column family
> "",  --supercolumn
>  userId, --startkey
> userId, --endkey
>  {
>     column_names = {},
>    slice_range = {
>      start = "",
>       finish = "",
>      reversed = true,
>                                      count = 20} },
>                                 1 --total keys
>                )
>
> Is not sorting correctly by supercolumn (the supercolumn names come out
> unsorted), this is a sample output for the pervious query using thrift
> directly:
>
> SC 1291648883
> SC 1291588465
> SC 1291588453
> SC 1291586385
> SC 1291587408
> SC 1291588174
> SC 1291585331
> SC 1291587116
> SC 1291651116
> SC 1291586332
> SC 1291588548
> SC 1291588036
> SC 1291648703
> SC 1291583651
> SC 1291583650
> SC 1291583649
> SC 1291583648
> SC 1291583647
> SC 1291583646
> SC 1291587485
>
>
> Anything I'm missing regarding sorting schemes?
>
> Thanks,
> Guille
>
>