You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Courtney Robinson <sa...@live.co.uk> on 2011/10/06 21:34:20 UTC
CF design
I was hoping someone could share their opinions on the following CF designs or suggest a better way of doing it.
My app is constantly receiving new data that contains URLs. I was
thinking of hashing this URL to form a key. The data is a JSON object with
several properties. For now many of its properties will be ignored and only 4
are of interests, URL, title, username, user_rating. Often times the same URL
is received but shared by a different user. I’m wondering if anyone can suggest
a better approach to what I propose below which will be able answer the
following .
Queries:
I’ll be asking the following questions:
1.
Give me the N most frequently shared items over :
a)
The last 30 minutes
b)
The last 24hrs
c)
Between D1 and D2 (where D1 and D2 represents
the start and end date of interest)
2)
Give me the N most shared item over the 3 time
periods above WHERE the average user rating is above 5
3)
Give me X for the item with the ID 123 (where X
is a property for the item with the ID 123)
Proposed approach
Use timestamps as keys in the CF, that should take care of
queries under 1 and partially handle 2
and use each column to store the JSON data, minus the common fields such as the
title which will be the same no matter how many users share the same link (they’ll
have their own columns in the row) other column names will be the user’s
username and the value for those columns will be any JSON left over that’s not
specific to the user.
For the rest of 2, I
can get the N items we’re interested in and calculate the average user rating
for each item client side. Of course using timestamp as key means we need to
maintain an index of the “real” keys/IDs to each item which would allow us to
answer “Give me the item with the ID 123”
Finally to address 3, I was thinking; Using the index get
the timestamp of the item, and on the client side find the property of
interest.
CF1
Timestamp1
Title
value
ID
ID1
Username3
{“rating”:5}
Username2
{“rating”:0}
Username2
{“rating”:4}
Timestamp2
Title
Value1
ID
ID2
Username24
{“rating”:1}
Username87
{“rating”:9}
Username7
{“rating”:2}
CF2
ID1
Timestamp1
ID2
Timestamp2
In the Username column, I'd ideally like to avoid storing the other properties as a JSON but I couldn't think of a way of doing it sensibly when that JSON grows into having 10 other properties.Does this sound like a sensible approach to designing my CFs?