You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alaa Zubaidi <al...@pdf.com> on 2011/02/16 02:06:28 UTC

latest rows

Hi,

What is the best way to retrieve the latest rows from a CF with OPP.

We are using OPP and key range queries but I cannot find an easy way to 
get the latest 10 keys for example from a column family with 1000s of keys.
I really don't want to create another CF to store row key names as 
columns and then retrieve the latest columns from this CF and use the 
row keys to retrieve the latest data.

Regards and Thanks,
Alaa

Re: latest rows

Posted by Alaa Zubaidi <al...@pdf.com>.

Thank you guys ...

On 2/16/2011 1:36 PM, Matthew Dennis wrote:
> +1 on avoiding OPP
>
> On Wed, Feb 16, 2011 at 3:27 PM, Tyler Hobbs<ty...@datastax.com>  wrote:
>
>> Thanks for you input, but we have a set key that consists of name:timestamp
>>> that we are using.. and we need to also retrieve the oldest data as well..
>>>
>> Then you'll need to denormalize and store every row three ways:  timestamp,
>> inverted timestamp, and normal, if you want to be able to access them in all
>> three ways using OPP.
>>
>> I would recommend not using OPP and just using timeline rows.  Here's a
>> fantastic discussion of OrderPreservingPartitioner vs RandomPartitioner<http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/>
>> .
>>
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax<http://datastax.com/>
>> Maintainer of the pycassa<http://github.com/pycassa/pycassa>  Cassandra
>> Python client library
>>
>>

-- 
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110  USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

Re: latest rows

Posted by Matthew Dennis <md...@datastax.com>.

+1 on avoiding OPP

On Wed, Feb 16, 2011 at 3:27 PM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> Thanks for you input, but we have a set key that consists of name:timestamp
>> that we are using.. and we need to also retrieve the oldest data as well..
>>
>
> Then you'll need to denormalize and store every row three ways:  timestamp,
> inverted timestamp, and normal, if you want to be able to access them in all
> three ways using OPP.
>
> I would recommend not using OPP and just using timeline rows.  Here's a
> fantastic discussion of OrderPreservingPartitioner vs RandomPartitioner<http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/>
> .
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax <http://datastax.com/>
> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
> Python client library
>
>

Re: latest rows

Posted by Tyler Hobbs <ty...@datastax.com>.

> Thanks for you input, but we have a set key that consists of name:timestamp
> that we are using.. and we need to also retrieve the oldest data as well..
>

Then you'll need to denormalize and store every row three ways:  timestamp,
inverted timestamp, and normal, if you want to be able to access them in all
three ways using OPP.

I would recommend not using OPP and just using timeline rows.  Here's a
fantastic discussion of OrderPreservingPartitioner vs
RandomPartitioner<http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/>
.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: latest rows

Posted by Alaa Zubaidi <al...@pdf.com>.

Hi Tyler,

Thanks for you input, but we have a set key that consists of 
name:timestamp that we are using.. and we need to also retrieve the 
oldest data as well..

Thanks

On 2/15/2011 9:07 PM, Tyler Hobbs wrote:
>> But wouldn't using timestamp as row keys cause conflicts?
>>
> Depending on client behavior, yes.  If that's an issue for you, make your
> own UUIDs by appending something random or client-specific to the timestamp.
>

-- 
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110  USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zubaidi@pdf.com

Re: latest rows

Posted by Tyler Hobbs <ty...@datastax.com>.

>
> But wouldn't using timestamp as row keys cause conflicts?
>

Depending on client behavior, yes.  If that's an issue for you, make your
own UUIDs by appending something random or client-specific to the timestamp.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: latest rows

Posted by Tan Yeh Zheng <ye...@chartnexus.com>.

But wouldn't using timestamp as row keys cause conflicts?
On Tue, 2011-02-15 at 19:11 -0600, Tyler Hobbs wrote:
> 
>         What is the best way to retrieve the latest rows from a CF
>         with OPP.
> 
> Use inverted timestamps (for example, 2^64 - timestamp) with zeros for
> padding as the row keys.
> 
> This way you can do a normal forward range scan and get the N latest
> rows.
> 
> -- 
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
> 

-- 
Best Regards,

Tan Yeh Zheng
Software Programmer

____________ ChartNexus® :: Chart Your Success ____________

ChartNexus Pte. Ltd.

15 Enggor Street #10-01
Realty Center
Singapore 079716
Tel:  (65) 6491 1456
Website: www.chartnexus.com

Disclaimer:
This email is confidential and intended only for the use of the
individual or individuals named above and may contain information that
is privileged. If you are not the intended recipient, you are notified
that any dissemination, distribution or copying of this email is
strictly prohibited.

Re: latest rows

Posted by Tyler Hobbs <ty...@datastax.com>.

> What is the best way to retrieve the latest rows from a CF with OPP.
>

Use inverted timestamps (for example, 2^64 - timestamp) with zeros for
padding as the row keys.

This way you can do a normal forward range scan and get the N latest rows.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library