You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "cbertu81@libero.it" <cb...@libero.it> on 2013/09/27 15:24:19 UTC
Refactoring old project
Hi all, in my very old Cassandra schema (started with 0.6 -- so without
secondary indexes -- and now on 1.0.6) I have a rating&review platform with
about 1 million review. The core of the application is the review that a user
can leave about a company. At the time I created many CF: Comments,
UserComments, CompanyComments , CityComments -- and I used timeuuid to keep
data sorted in the way i needed (UserComments/CompanyComments/CityComments did
not keep real comments but just a "referece" [id] to the comment table)
Since I need comments to be sorted by date, what would be the best way to
write it again using cassandra 2.0?
Obviously all these CF will merge into one. What I would need is to perform
query likes ...
Get latest X comments in a specific city
Get latest X comments of a company
Get latest X comments of a user
I can't sort client side because, even if for user/company I can have up to
200 reviews, for a city I can have 50.000 and more comments.
I know that murmur3 is the suggested one but I wonder if this is not the case
to use the Order Preserving.
A row entry would be something like
CommentID (RowKey) -- companyId -- userId -- text - vote - city
Another idea is to use a composite key made by (city, commentid) so I would
have all comments sorted by city for free and could perform client-side sorting
for user/company comments. Am I missing something?
TIA,
Carlo
Re: Refactoring old project
Posted by Aaron Morton <aa...@thelastpickle.com>.
I would try:
Comments CF:
row_key: (thing_type : thing_id ) where thing_type is "city" etc
column_name: (comment_id (reversed)) where comment_id is a timeuuid
column_value: the comment.
You will need to be wary of very wide rows.
It's a pretty simple model for CQL 3 as well:
CREATE TABLE comments (
thing_type text,
thing_id long,
comment_id timeuuid,
body text
user text,
PRIMARY KEY ( (thing_type, thing_id), comment_id)
)
or
CREATE TABLE comments (
thing_type text,
thing_id long,
created_at timestamp.
user text,
comment_id long,
body text
PRIMARY KEY ( (thing_type, thing_id), created_at, user)
)
Hope that helps.
-----------------
Aaron Morton
New Zealand
@aaronmorton
Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com
On 28/09/2013, at 1:24 AM, cbertu81@libero.it wrote:
> Hi all, in my very old Cassandra schema (started with 0.6 -- so without
> secondary indexes -- and now on 1.0.6) I have a rating&review platform with
> about 1 million review. The core of the application is the review that a user
> can leave about a company. At the time I created many CF: Comments,
> UserComments, CompanyComments , CityComments -- and I used timeuuid to keep
> data sorted in the way i needed (UserComments/CompanyComments/CityComments did
> not keep real comments but just a "referece" [id] to the comment table)
>
> Since I need comments to be sorted by date, what would be the best way to
> write it again using cassandra 2.0?
> Obviously all these CF will merge into one. What I would need is to perform
> query likes ...
>
> Get latest X comments in a specific city
> Get latest X comments of a company
> Get latest X comments of a user
>
> I can't sort client side because, even if for user/company I can have up to
> 200 reviews, for a city I can have 50.000 and more comments.
> I know that murmur3 is the suggested one but I wonder if this is not the case
> to use the Order Preserving.
>
> A row entry would be something like
>
> CommentID (RowKey) -- companyId -- userId -- text - vote - city
>
> Another idea is to use a composite key made by (city, commentid) so I would
> have all comments sorted by city for free and could perform client-side sorting
> for user/company comments. Am I missing something?
>
> TIA,
> Carlo
>
>