You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "cbertu81@libero.it" <cb...@libero.it> on 2013/09/27 15:24:19 UTC

Refactoring old project

Hi all, in my very old Cassandra schema (started with 0.6 -- so without 
secondary indexes -- and now on 1.0.6) I have a rating&review platform with 
about 1 million review. The core of the application is the review that a user 
can leave about a company. At the time I created many CF: Comments, 
UserComments, CompanyComments , CityComments -- and I used timeuuid to keep 
data sorted in the way i needed (UserComments/CompanyComments/CityComments did 
not keep real comments but just a "referece" [id] to the comment table)

Since I need comments to be sorted by date, what would be the best way to 
write it again using cassandra 2.0?
Obviously all these CF will merge into one. What I would need is to perform 
query likes ...

Get latest X comments in a specific city
Get latest X comments of a company
Get latest X comments of a user

I can't sort client side because, even if for user/company I can have up to 
200 reviews, for a city I can have 50.000 and more comments.
I know that murmur3 is the suggested one but I wonder if this is not the case 
to use the Order Preserving.

A row entry would be something like

CommentID (RowKey) -- companyId -- userId -- text - vote - city

Another idea is to use a composite key made by (city, commentid) so I would 
have all comments sorted by city for free and could perform client-side sorting 
for user/company comments. Am I missing something? 

TIA,
Carlo

Re: Refactoring old project

Posted by Aaron Morton <aa...@thelastpickle.com>.

I would try:

Comments CF:
row_key: (thing_type : thing_id ) where thing_type is "city" etc
column_name: (comment_id (reversed)) where comment_id is a timeuuid
column_value: the comment. 

You will need to be wary of very wide rows. 

It's a pretty simple model for CQL 3 as well:

CREATE TABLE comments (
	thing_type 	text, 
	thing_id		long, 
	comment_id	timeuuid,
	body			text
	user			text, 
	PRIMARY KEY ( (thing_type, thing_id), comment_id)
)

or 

CREATE TABLE comments (
	thing_type 	text, 
	thing_id		long,
	created_at 	timestamp. 
	user			text, 
	comment_id	long,
	body			text

	PRIMARY KEY ( (thing_type, thing_id), created_at, user)
)


Hope that helps. 

 
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 28/09/2013, at 1:24 AM, cbertu81@libero.it wrote:

> Hi all, in my very old Cassandra schema (started with 0.6 -- so without 
> secondary indexes -- and now on 1.0.6) I have a rating&review platform with 
> about 1 million review. The core of the application is the review that a user 
> can leave about a company. At the time I created many CF: Comments, 
> UserComments, CompanyComments , CityComments -- and I used timeuuid to keep 
> data sorted in the way i needed (UserComments/CompanyComments/CityComments did 
> not keep real comments but just a "referece" [id] to the comment table)
> 
> Since I need comments to be sorted by date, what would be the best way to 
> write it again using cassandra 2.0?
> Obviously all these CF will merge into one. What I would need is to perform 
> query likes ...
> 
> Get latest X comments in a specific city
> Get latest X comments of a company
> Get latest X comments of a user
> 
> I can't sort client side because, even if for user/company I can have up to 
> 200 reviews, for a city I can have 50.000 and more comments.
> I know that murmur3 is the suggested one but I wonder if this is not the case 
> to use the Order Preserving.
> 
> A row entry would be something like
> 
> CommentID (RowKey) -- companyId -- userId -- text - vote - city
> 
> Another idea is to use a composite key made by (city, commentid) so I would 
> have all comments sorted by city for free and could perform client-side sorting 
> for user/company comments. Am I missing something? 
> 
> TIA,
> Carlo
> 
>