You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Diamond ben <di...@outlook.com> on 2016/11/10 01:47:05 UTC

答复: 答复: A difficult data model with C*

The solution maybe work. However, the play list will grow over time and somebody maybe has ten thousands that will slow down the query and sort . Do you mean the oldest one should be removed when a new play is added?

BTW, the version is 2.1.16 in our live system.


BRs,

BEN

________________________________
发件人: Vladimir Yudovin <vl...@winguzone.com>
发送时间: 2016年11月9日 18:11:26
收件人: user
主题: Re: 答复: A difficult data model with C*

You are welcome! )

>recent ten movies watched by the user within 30 days.
In this case you can't use PRIMARY KEY (user_name, video_id), as video_id is demanded to fetch row, so all this stuff may be
CREATE TYPE play (video_id text, position int, last_time timestamp);
CREATE TABLE recent (user_name text PRIMARY KEY, play_list LIST<frozen<play>>);
You can easily retrieve play list for specific user by his ID. Instead of LIST you can use MAP, I don't think that for ten entries it matters.


Best regards, Vladimir Yudovin,
Winguzone<https://winguzone.com?from=list> - Hosted Cloud Cassandra
Launch your cluster in minutes.


---- On Tue, 08 Nov 2016 22:29:48 -0500ben ben <di...@outlook.com> wrote ----


Hi Vladimir Yudovin,


    Thank you very much for your detailed explaining. Maybe I didn't describe the requirement clearly. The use cases should be:

1. a user login our app.

2. show the recent ten movies watched by the user within 30 days.

3. the user can click any one of the ten movie and continue to watch from the last position she/he did. BTW, a movie can be watched several times by a user and the last positon is needed indeed.


BRs,

BEN

________________________________

发件人: Vladimir Yudovin <vl...@winguzone.com>>
发送时间: 2016年11月8日 22:35:48
收件人: user
主题: Re: A difficult data model with C*

Hi Ben,

if need very limited number of positions (as you said ten) may be you can store them in LIST of UDT? Or just as JSON string?
So you'll have one row per each pair user-video.

It can be something like this:

CREATE TYPE play (position int, last_time timestamp);
CREATE TABLE recent (user_name text, video_id text, review LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));

UPDATE recent set review = review + [(1234,12345)] where user_name='some user' AND video_id='great video';
UPDATE recent set review = review + [(1234,123456)] where user_name='some user' AND video_id='great video';
UPDATE recent set review = review + [(1234,1234567)] where user_name='some user' AND video_id='great video';

You can delete the oldest entry by index:
DELETE review[0] FROM recent WHERE user_name='some user' AND video_id='great video';

or by value, if you know the oldest entry:

UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some user' AND video_id='great video';

Best regards, Vladimir Yudovin,
Winguzone<https://winguzone.com?from=list> - Hosted Cloud Cassandra
Launch your cluster in minutes.


---- On Mon, 07 Nov 2016 21:54:08 -0500ben ben <di...@outlook.com>> wrote ----



Hi guys,

We are maintaining a system for an on-line video service. ALL users' viewing records of every movie are stored in C*. So she/he can continue to enjoy the movie from the last point next time. The table is designed as below:
CREATE TABLE recent (
user_name text,
vedio_id text,
position int,
last_time timestamp,
PRIMARY KEY (user_name, vedio_id)
)

It worked well before. However, the records increase every day and the last ten items may be adequate for the business. The current model use vedio_id as cluster key to keep a row for a movie, but as you know, the business prefer to order by the last_time desc. If we use last_time as cluster key, there will be many records for a singe movie and the recent one is actually desired. So how to model that? Do you have any suggestions?
Thanks!


BRs,
BEN

Re: 答复: 答复: A difficult data model with C*

Posted by Benjamin Roth <be...@jaumo.com>.

Yes sorry. Was irritated by the fact that Video id wasn't. Anyway probably
an mv could be a way to go.

Am 10.11.2016 13:38 schrieb "Carlos Alonso" <in...@mrcalonso.com>:

Hi Ben, you're right, but in my example the last_time timestamp field is
actually part of the primary key.

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 10 November 2016 at 11:50, Benjamin Roth <be...@jaumo.com> wrote:

> I pretty much guess the CQL you posted is invalid. You cannot set a
> clustering column that is not part of the primary key.
> But you can use a materialized view to append the last_time to the primary
> key and still preserver uniqueness of username + vedio_id (guess it is a
> typo in vedio).
>
> 2016-11-10 10:47 GMT+00:00 Carlos Alonso <in...@mrcalonso.com>:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST<frozen<play>>);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>>     Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>>
>>> Hi guys,
>>>
>>> We are maintaining a system for an on-line video service. ALL users'
>>> viewing records of every movie are stored in C*. So she/he can continue to
>>> enjoy the movie from the last point next time. The table is designed as
>>> below:
>>> CREATE TABLE recent (
>>> user_name text,
>>> vedio_id text,
>>> position int,
>>> last_time timestamp,
>>> PRIMARY KEY (user_name, vedio_id)
>>> )
>>>
>>> It worked well before. However, the records increase every day and the
>>> last ten items may be adequate for the business. The current model use
>>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>>> business prefer to order by the last_time desc. If we use last_time as
>>> cluster key, there will be many records for a singe movie and the recent
>>> one is actually desired. So how to model that? Do you have any suggestions?
>>> Thanks!
>>>
>>>
>>> BRs,
>>> BEN
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

Re: 答复: 答复: A difficult data model with C*

Posted by Carlos Alonso <in...@mrcalonso.com>.

Hi Ben, you're right, but in my example the last_time timestamp field is
actually part of the primary key.

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 10 November 2016 at 11:50, Benjamin Roth <be...@jaumo.com> wrote:

> I pretty much guess the CQL you posted is invalid. You cannot set a
> clustering column that is not part of the primary key.
> But you can use a materialized view to append the last_time to the primary
> key and still preserver uniqueness of username + vedio_id (guess it is a
> typo in vedio).
>
> 2016-11-10 10:47 GMT+00:00 Carlos Alonso <in...@mrcalonso.com>:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST<frozen<play>>);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>>     Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>>
>>> Hi guys,
>>>
>>> We are maintaining a system for an on-line video service. ALL users'
>>> viewing records of every movie are stored in C*. So she/he can continue to
>>> enjoy the movie from the last point next time. The table is designed as
>>> below:
>>> CREATE TABLE recent (
>>> user_name text,
>>> vedio_id text,
>>> position int,
>>> last_time timestamp,
>>> PRIMARY KEY (user_name, vedio_id)
>>> )
>>>
>>> It worked well before. However, the records increase every day and the
>>> last ten items may be adequate for the business. The current model use
>>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>>> business prefer to order by the last_time desc. If we use last_time as
>>> cluster key, there will be many records for a singe movie and the recent
>>> one is actually desired. So how to model that? Do you have any suggestions?
>>> Thanks!
>>>
>>>
>>> BRs,
>>> BEN
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

Re: 答复: 答复: A difficult data model with C*

Posted by Benjamin Roth <be...@jaumo.com>.

I pretty much guess the CQL you posted is invalid. You cannot set a
clustering column that is not part of the primary key.
But you can use a materialized view to append the last_time to the primary
key and still preserver uniqueness of username + vedio_id (guess it is a
typo in vedio).

2016-11-10 10:47 GMT+00:00 Carlos Alonso <in...@mrcalonso.com>:

> What about having something like
>
> CREATE TABLE user_views (
>   user_name text,
>   video_id text,
>   position int,
>   last_time timestamp,
>   PRIMARY KEY(user_name, last_time)
> ) WITH CLUSTERING ORDER BY (last_time DESC);
>
> Where you insert a record everytime a user watches a video and then having
> a batch task (every night maybe?) that deletes the extra rows that are not
> needed anymore.
> The query pattern for this is quite efficient as something like SELECT *
> FROM user_views WHERE user_name = ? LIMIT 10;
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 10 November 2016 at 09:19, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>> >Do you mean the oldest one should be removed when a new play is added?
>> Sure. As you described the issue "the last ten items may be adequate for
>> the business"
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>
>> The solution maybe work. However, the play list will grow over time and
>> somebody maybe has ten thousands that will slow down the query and sort .
>> Do you mean the oldest one should be removed when a new play is added?
>>
>> BTW, the version is 2.1.16 in our live system.
>>
>>
>> BRs,
>>
>> BEN
>> ------------------------------
>>
>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>> *发送时间:* 2016年11月9日 18:11:26
>> *收件人:* user
>> *主题:* Re: 答复: A difficult data model with C*
>>
>> You are welcome! )
>>
>> >recent ten movies watched by the user within 30 days.
>> In this case you can't use PRIMARY KEY (user_name, video_id), as video_id
>> is demanded to fetch row, so all this stuff may be
>>
>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>> LIST<frozen<play>>);
>>
>> You can easily retrieve play list for specific user by his ID. Instead of
>> LIST you can use MAP, I don't think that for ten entries it matters.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben <diamond.ben@outlook.com
>> <di...@outlook.com>>* wrote ----
>>
>> Hi Vladimir Yudovin,
>>
>>
>>     Thank you very much for your detailed explaining. Maybe I didn't
>> describe the requirement clearly. The use cases should be:
>>
>> 1. a user login our app.
>>
>> 2. show the recent ten movies watched by the user within 30 days.
>>
>> 3. the user can click any one of the ten movie and continue to watch
>> from the last position she/he did. BTW, a movie can be watched several
>> times by a user and the last positon is needed indeed.
>>
>> BRs,
>>
>> BEN
>> ------------------------------
>>
>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>> *发送时间:* 2016年11月8日 22:35:48
>> *收件人:* user
>> *主题:* Re: A difficult data model with C*
>>
>> Hi Ben,
>>
>> if need very limited number of positions (as you said ten) may be you can
>> store them in LIST of UDT? Or just as JSON string?
>> So you'll have one row per each pair user-video.
>>
>> It can be something like this:
>>
>> CREATE TYPE play (position int, last_time timestamp);
>> CREATE TABLE recent (user_name text, video_id text, review
>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>
>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,123456)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,1234567)] where
>> user_name='some user' AND video_id='great video';
>>
>> You can delete the oldest entry by index:
>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>> video_id='great video';
>>
>> or by value, if you know the oldest entry:
>>
>> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
>> user' AND video_id='great video';
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben <diamond.ben@outlook.com
>> <di...@outlook.com>>* wrote ----
>>
>>
>> Hi guys,
>>
>> We are maintaining a system for an on-line video service. ALL users'
>> viewing records of every movie are stored in C*. So she/he can continue to
>> enjoy the movie from the last point next time. The table is designed as
>> below:
>> CREATE TABLE recent (
>> user_name text,
>> vedio_id text,
>> position int,
>> last_time timestamp,
>> PRIMARY KEY (user_name, vedio_id)
>> )
>>
>> It worked well before. However, the records increase every day and the
>> last ten items may be adequate for the business. The current model use
>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>> business prefer to order by the last_time desc. If we use last_time as
>> cluster key, there will be many records for a singe movie and the recent
>> one is actually desired. So how to model that? Do you have any suggestions?
>> Thanks!
>>
>>
>> BRs,
>> BEN
>>
>>
>>
>>
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: 答复: 答复: A difficult data model with C*

Posted by Benjamin Roth <be...@jaumo.com>.

This is the reason why One would like to use an mv for it. An mv Handels
this. It adds a clustering Key while preserving uniqueness of the original
pk.

Am 11.11.2016 02:33 schrieb "Gang Liu" <ga...@gmail.com>:

> I guess orignal design is keep one record for one video per user. maybe
> their app will report many play records when user watching one video.
> So there will be many records when change primary key to (user_name,
> last_time). Also
> SELECT * FROM user_views WHERE user_name = ? LIMIT 10
> without group by video_id can't fit business requirement.
>
> regards,
> Gang
>
>
> On Thu, Nov 10, 2016 at 6:47 PM, Carlos Alonso <in...@mrcalonso.com> wrote:
>
>> What about having something like
>>
>> CREATE TABLE user_views (
>>   user_name text,
>>   video_id text,
>>   position int,
>>   last_time timestamp,
>>   PRIMARY KEY(user_name, last_time)
>> ) WITH CLUSTERING ORDER BY (last_time DESC);
>>
>> Where you insert a record everytime a user watches a video and then
>> having a batch task (every night maybe?) that deletes the extra rows that
>> are not needed anymore.
>> The query pattern for this is quite efficient as something like SELECT *
>> FROM user_views WHERE user_name = ? LIMIT 10;
>>
>> Regards
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 10 November 2016 at 09:19, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>> >Do you mean the oldest one should be removed when a new play is added?
>>> Sure. As you described the issue "the last ten items may be adequate for
>>> the business"
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>> The solution maybe work. However, the play list will grow over time and
>>> somebody maybe has ten thousands that will slow down the query and sort .
>>> Do you mean the oldest one should be removed when a new play is added?
>>>
>>> BTW, the version is 2.1.16 in our live system.
>>>
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>>> *发送时间:* 2016年11月9日 18:11:26
>>> *收件人:* user
>>> *主题:* Re: 答复: A difficult data model with C*
>>>
>>> You are welcome! )
>>>
>>> >recent ten movies watched by the user within 30 days.
>>> In this case you can't use PRIMARY KEY (user_name, video_id), as
>>> video_id is demanded to fetch row, so all this stuff may be
>>>
>>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>>> LIST<frozen<play>>);
>>>
>>> You can easily retrieve play list for specific user by his ID. Instead
>>> of LIST you can use MAP, I don't think that for ten entries it matters.
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>> Hi Vladimir Yudovin,
>>>
>>>
>>>     Thank you very much for your detailed explaining. Maybe I didn't
>>> describe the requirement clearly. The use cases should be:
>>>
>>> 1. a user login our app.
>>>
>>> 2. show the recent ten movies watched by the user within 30 days.
>>>
>>> 3. the user can click any one of the ten movie and continue to watch
>>> from the last position she/he did. BTW, a movie can be watched several
>>> times by a user and the last positon is needed indeed.
>>>
>>> BRs,
>>>
>>> BEN
>>> ------------------------------
>>>
>>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>>> *发送时间:* 2016年11月8日 22:35:48
>>> *收件人:* user
>>> *主题:* Re: A difficult data model with C*
>>>
>>> Hi Ben,
>>>
>>> if need very limited number of positions (as you said ten) may be you
>>> can store them in LIST of UDT? Or just as JSON string?
>>> So you'll have one row per each pair user-video.
>>>
>>> It can be something like this:
>>>
>>> CREATE TYPE play (position int, last_time timestamp);
>>> CREATE TABLE recent (user_name text, video_id text, review
>>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>>
>>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>>> user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,123456)] where
>>> user_name='some user' AND video_id='great video';
>>> UPDATE recent set review = review + [(1234,1234567)] where
>>> user_name='some user' AND video_id='great video';
>>>
>>> You can delete the oldest entry by index:
>>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>>> video_id='great video';
>>>
>>> or by value, if you know the oldest entry:
>>>
>>> UPDATE recent SET review = review - [(1234,12345)]  WHERE
>>> user_name='some user' AND video_id='great video';
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben
>>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>>
>>>
>>> Hi guys,
>>>
>>> We are maintaining a system for an on-line video service. ALL users'
>>> viewing records of every movie are stored in C*. So she/he can continue to
>>> enjoy the movie from the last point next time. The table is designed as
>>> below:
>>> CREATE TABLE recent (
>>> user_name text,
>>> vedio_id text,
>>> position int,
>>> last_time timestamp,
>>> PRIMARY KEY (user_name, vedio_id)
>>> )
>>>
>>> It worked well before. However, the records increase every day and the
>>> last ten items may be adequate for the business. The current model use
>>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>>> business prefer to order by the last_time desc. If we use last_time as
>>> cluster key, there will be many records for a singe movie and the recent
>>> one is actually desired. So how to model that? Do you have any suggestions?
>>> Thanks!
>>>
>>>
>>> BRs,
>>> BEN
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: 答复: 答复: A difficult data model with C*

Posted by Gang Liu <ga...@gmail.com>.

I guess orignal design is keep one record for one video per user. maybe
their app will report many play records when user watching one video.
So there will be many records when change primary key to (user_name,
last_time). Also
SELECT * FROM user_views WHERE user_name = ? LIMIT 10
without group by video_id can't fit business requirement.

regards,
Gang


On Thu, Nov 10, 2016 at 6:47 PM, Carlos Alonso <in...@mrcalonso.com> wrote:

> What about having something like
>
> CREATE TABLE user_views (
>   user_name text,
>   video_id text,
>   position int,
>   last_time timestamp,
>   PRIMARY KEY(user_name, last_time)
> ) WITH CLUSTERING ORDER BY (last_time DESC);
>
> Where you insert a record everytime a user watches a video and then having
> a batch task (every night maybe?) that deletes the extra rows that are not
> needed anymore.
> The query pattern for this is quite efficient as something like SELECT *
> FROM user_views WHERE user_name = ? LIMIT 10;
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 10 November 2016 at 09:19, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>> >Do you mean the oldest one should be removed when a new play is added?
>> Sure. As you described the issue "the last ten items may be adequate for
>> the business"
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
>> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>>
>> The solution maybe work. However, the play list will grow over time and
>> somebody maybe has ten thousands that will slow down the query and sort .
>> Do you mean the oldest one should be removed when a new play is added?
>>
>> BTW, the version is 2.1.16 in our live system.
>>
>>
>> BRs,
>>
>> BEN
>> ------------------------------
>>
>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>> *发送时间:* 2016年11月9日 18:11:26
>> *收件人:* user
>> *主题:* Re: 答复: A difficult data model with C*
>>
>> You are welcome! )
>>
>> >recent ten movies watched by the user within 30 days.
>> In this case you can't use PRIMARY KEY (user_name, video_id), as video_id
>> is demanded to fetch row, so all this stuff may be
>>
>> CREATE TYPE play (video_id text, position int, last_time timestamp);
>> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
>> LIST<frozen<play>>);
>>
>> You can easily retrieve play list for specific user by his ID. Instead of
>> LIST you can use MAP, I don't think that for ten entries it matters.
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben <diamond.ben@outlook.com
>> <di...@outlook.com>>* wrote ----
>>
>> Hi Vladimir Yudovin,
>>
>>
>>     Thank you very much for your detailed explaining. Maybe I didn't
>> describe the requirement clearly. The use cases should be:
>>
>> 1. a user login our app.
>>
>> 2. show the recent ten movies watched by the user within 30 days.
>>
>> 3. the user can click any one of the ten movie and continue to watch
>> from the last position she/he did. BTW, a movie can be watched several
>> times by a user and the last positon is needed indeed.
>>
>> BRs,
>>
>> BEN
>> ------------------------------
>>
>> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
>> *发送时间:* 2016年11月8日 22:35:48
>> *收件人:* user
>> *主题:* Re: A difficult data model with C*
>>
>> Hi Ben,
>>
>> if need very limited number of positions (as you said ten) may be you can
>> store them in LIST of UDT? Or just as JSON string?
>> So you'll have one row per each pair user-video.
>>
>> It can be something like this:
>>
>> CREATE TYPE play (position int, last_time timestamp);
>> CREATE TABLE recent (user_name text, video_id text, review
>> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>>
>> UPDATE recent set review = review + [(1234,12345)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,123456)] where user_name='some
>> user' AND video_id='great video';
>> UPDATE recent set review = review + [(1234,1234567)] where
>> user_name='some user' AND video_id='great video';
>>
>> You can delete the oldest entry by index:
>> DELETE review[0] FROM recent WHERE user_name='some user' AND
>> video_id='great video';
>>
>> or by value, if you know the oldest entry:
>>
>> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
>> user' AND video_id='great video';
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben <diamond.ben@outlook.com
>> <di...@outlook.com>>* wrote ----
>>
>>
>> Hi guys,
>>
>> We are maintaining a system for an on-line video service. ALL users'
>> viewing records of every movie are stored in C*. So she/he can continue to
>> enjoy the movie from the last point next time. The table is designed as
>> below:
>> CREATE TABLE recent (
>> user_name text,
>> vedio_id text,
>> position int,
>> last_time timestamp,
>> PRIMARY KEY (user_name, vedio_id)
>> )
>>
>> It worked well before. However, the records increase every day and the
>> last ten items may be adequate for the business. The current model use
>> vedio_id as cluster key to keep a row for a movie, but as you know, the
>> business prefer to order by the last_time desc. If we use last_time as
>> cluster key, there will be many records for a singe movie and the recent
>> one is actually desired. So how to model that? Do you have any suggestions?
>> Thanks!
>>
>>
>> BRs,
>> BEN
>>
>>
>>
>>
>>
>>
>

Re: 答复: 答复: A difficult data model with C*

Posted by Carlos Alonso <in...@mrcalonso.com>.

What about having something like

CREATE TABLE user_views (
  user_name text,
  video_id text,
  position int,
  last_time timestamp,
  PRIMARY KEY(user_name, last_time)
) WITH CLUSTERING ORDER BY (last_time DESC);

Where you insert a record everytime a user watches a video and then having
a batch task (every night maybe?) that deletes the extra rows that are not
needed anymore.
The query pattern for this is quite efficient as something like SELECT *
FROM user_views WHERE user_name = ? LIMIT 10;

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 10 November 2016 at 09:19, Vladimir Yudovin <vl...@winguzone.com> wrote:

> >Do you mean the oldest one should be removed when a new play is added?
> Sure. As you described the issue "the last ten items may be adequate for
> the business"
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben
> <diamond.ben@outlook.com <di...@outlook.com>>* wrote ----
>
> The solution maybe work. However, the play list will grow over time and
> somebody maybe has ten thousands that will slow down the query and sort .
> Do you mean the oldest one should be removed when a new play is added?
>
> BTW, the version is 2.1.16 in our live system.
>
>
> BRs,
>
> BEN
> ------------------------------
>
> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
> *发送时间:* 2016年11月9日 18:11:26
> *收件人:* user
> *主题:* Re: 答复: A difficult data model with C*
>
> You are welcome! )
>
> >recent ten movies watched by the user within 30 days.
> In this case you can't use PRIMARY KEY (user_name, video_id), as video_id
> is demanded to fetch row, so all this stuff may be
>
> CREATE TYPE play (video_id text, position int, last_time timestamp);
> CREATE TABLE recent (user_name text PRIMARY KEY, play_list
> LIST<frozen<play>>);
>
> You can easily retrieve play list for specific user by his ID. Instead of
> LIST you can use MAP, I don't think that for ten entries it matters.
>
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> Launch your cluster in minutes.*
>
>
> ---- On Tue, 08 Nov 2016 22:29:48 -0500*ben ben <diamond.ben@outlook.com
> <di...@outlook.com>>* wrote ----
>
> Hi Vladimir Yudovin,
>
>
>     Thank you very much for your detailed explaining. Maybe I didn't
> describe the requirement clearly. The use cases should be:
>
> 1. a user login our app.
>
> 2. show the recent ten movies watched by the user within 30 days.
>
> 3. the user can click any one of the ten movie and continue to watch from
> the last position she/he did. BTW, a movie can be watched several times by
> a user and the last positon is needed indeed.
>
> BRs,
>
> BEN
> ------------------------------
>
> *发件人:* Vladimir Yudovin <vl...@winguzone.com>
> *发送时间:* 2016年11月8日 22:35:48
> *收件人:* user
> *主题:* Re: A difficult data model with C*
>
> Hi Ben,
>
> if need very limited number of positions (as you said ten) may be you can
> store them in LIST of UDT? Or just as JSON string?
> So you'll have one row per each pair user-video.
>
> It can be something like this:
>
> CREATE TYPE play (position int, last_time timestamp);
> CREATE TABLE recent (user_name text, video_id text, review
> LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));
>
> UPDATE recent set review = review + [(1234,12345)] where user_name='some
> user' AND video_id='great video';
> UPDATE recent set review = review + [(1234,123456)] where user_name='some
> user' AND video_id='great video';
> UPDATE recent set review = review + [(1234,1234567)] where user_name='some
> user' AND video_id='great video';
>
> You can delete the oldest entry by index:
> DELETE review[0] FROM recent WHERE user_name='some user' AND
> video_id='great video';
>
> or by value, if you know the oldest entry:
>
> UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some
> user' AND video_id='great video';
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> Launch your cluster in minutes.*
>
>
> ---- On Mon, 07 Nov 2016 21:54:08 -0500*ben ben <diamond.ben@outlook.com
> <di...@outlook.com>>* wrote ----
>
>
> Hi guys,
>
> We are maintaining a system for an on-line video service. ALL users'
> viewing records of every movie are stored in C*. So she/he can continue to
> enjoy the movie from the last point next time. The table is designed as
> below:
> CREATE TABLE recent (
> user_name text,
> vedio_id text,
> position int,
> last_time timestamp,
> PRIMARY KEY (user_name, vedio_id)
> )
>
> It worked well before. However, the records increase every day and the
> last ten items may be adequate for the business. The current model use
> vedio_id as cluster key to keep a row for a movie, but as you know, the
> business prefer to order by the last_time desc. If we use last_time as
> cluster key, there will be many records for a singe movie and the recent
> one is actually desired. So how to model that? Do you have any suggestions?
> Thanks!
>
>
> BRs,
> BEN
>
>
>
>
>
>

Re: 答复: 答复: A difficult data model with C*

Posted by Vladimir Yudovin <vl...@winguzone.com>.

&gt;Do you mean the oldest one should be removed when a new play is added?

Sure. As you described the issue "the last ten items may be adequate for the business"



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





---- On Wed, 09 Nov 2016 20:47:05 -0500Diamond ben &lt;diamond.ben@outlook.com&gt; wrote ----




The solution maybe work. However, the play list will grow over time and somebody maybe has ten thousands that will slow down the query and sort . Do you mean the oldest one should be removed when a new play is added?

BTW, the version is 2.1.16 in our live system.



BRs,

BEN




发件人: Vladimir Yudovin &lt;vladyu@winguzone.com&gt;
 发送时间: 2016年11月9日 18:11:26
 收件人: user
 主题: Re: 答复: A difficult data model with C* 
 


You are welcome! )



&gt;recent ten movies watched by the user within 30 days.

In this case you can't use PRIMARY KEY (user_name, video_id), as video_id is demanded to fetch row, so all this stuff may be

CREATE TYPE play (video_id text, position int, last_time timestamp);

CREATE TABLE recent (user_name text PRIMARY KEY, play_list LIST&lt;frozen&lt;play&gt;&gt;);


You can easily retrieve play list for specific user by his ID. Instead of LIST you can use MAP, I don't think that for ten entries it matters.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





---- On Tue, 08 Nov 2016 22:29:48 -0500ben ben &lt;diamond.ben@outlook.com&gt; wrote ----




Hi Vladimir Yudovin,



    Thank you very much for your detailed explaining. Maybe I didn't describe the requirement clearly. The use cases should be:

1. a user login our app.

2. show the recent ten movies watched by the user within 30 days.

3. the user can click any one of the ten movie and continue to watch from the last position she/he did. BTW, a movie can be watched several times by a user and the last positon is needed indeed.



BRs,

BEN




发件人: Vladimir Yudovin &lt;vladyu@winguzone.com&gt;
 发送时间: 2016年11月8日 22:35:48
 收件人: user
 主题: Re: A difficult data model with C*
 


Hi Ben,



if need very limited number of positions (as you said ten) may be you can store them in LIST of UDT? Or just as JSON string?

So you'll have one row per each pair user-video. 



It can be something like this:



CREATE TYPE play (position int, last_time timestamp);

CREATE TABLE recent (user_name text, video_id text, review LIST&lt;frozen&lt;play&gt;&gt;, PRIMARY KEY (user_name, video_id));



UPDATE recent set review = review + [(1234,12345)] where user_name='some user' AND video_id='great video';

UPDATE recent set review = review + [(1234,123456)] where user_name='some user' AND video_id='great video';

UPDATE recent set review = review + [(1234,1234567)] where user_name='some user' AND video_id='great video';



You can delete the oldest entry by index:

DELETE review[0] FROM recent WHERE user_name='some user' AND video_id='great video';



or by value, if you know the oldest entry:



UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some user' AND video_id='great video';



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





---- On Mon, 07 Nov 2016 21:54:08 -0500ben ben &lt;diamond.ben@outlook.com&gt; wrote ----






Hi guys,



We are maintaining a system for an on-line video service. ALL users' viewing records of every movie are stored in C*. So she/he can continue to enjoy the movie from the last point next time. The table is designed as below:

CREATE TABLE recent (

user_name text,

vedio_id text,

position int,

last_time timestamp,

PRIMARY KEY (user_name, vedio_id)

)



It worked well before. However, the records increase every day and the last ten items may be adequate for the business. The current model use vedio_id as cluster key to keep a row for a movie, but as you know, the business prefer to order by the last_time desc. If we use last_time as cluster key, there will be many records for a singe movie and the recent one is actually desired. So how to model that? Do you have any suggestions? 

Thanks!





BRs,

BEN