You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by SEGALIS Morgan <ms...@gmail.com> on 2015/01/22 20:19:03 UTC

Cassandra row ordering best practice Modeling

I have a column family that store articles. I'll need to get those articles
from the most recent to the oldest, getting them from Country, and of
course the ability to limit the number of fetched articles.

I though about another ColumnFamily "ArticlesByDateAndCountry" with dynamic
columns

The Key would a mix from the 2 Char country Code (ISO 3166-1), and the
articles day's date so something like : US-20150118 or FR-20141230 --
(XX-YYYYMMDD)

In those Row, the column name would be the timeuuid of the article, and the
value is the article's ID.

It would probably get a thousand of articles per day for each country.

Let's say I want to show only 100 of the newer articles, I'll get the
today's articles, and if it does not fill the request (too few articles),
I'll check the day before that, etc...

Is that the best practice, or does someone has a better idea for this
purpose ?

Re: Cassandra row ordering best practice Modeling

Posted by DuyHai Doan <do...@gmail.com>.

You get it :D

 This is the real issue. However it's quite an extreme case. If you can
guarantee that there will be a minimum X articles per day and per country,
the maximum number of request to fetch 100 articles will be bounded.

 Furthermore, do not forget that SELECT statement using a partition key
will leverage bloom filters so in case of true negative (no article for a
day) Cassandra will not touch disk

On Thu, Jan 22, 2015 at 9:30 PM, SEGALIS Morgan <ms...@gmail.com> wrote:

> Oh yeah, I though about it, even raised the reflexion on the first mail,
>
> "Let's say I want to show only 100 of the newer articles, I'll get the
> today's articles, and if it does not fill the request (too few articles),
> I'll check the day before that, etc..."
>
> but your answer raised another issue I did not though of before :
> - going back on previous days, let's say I want 100 newest articles
> - If there is at most 1 article per day, and some 0, I will have do more
> 100+ queries to get all the posts, won't it be a little too much ?
>
> 2015-01-22 20:47 GMT+01:00 DuyHai Doan <do...@gmail.com>:
>
>> well, if the current day bucket does not contain enough article, you may
>> need to search back in the previous day. If the previous day does not have
>> any article, you may need to go back time a day before ... and so on ...
>>
>>  Of course it's a corner case but I've seen some code that misses this
>> scenario and ends up in an infinite loop back in time ...
>>
>> On Thu, Jan 22, 2015 at 8:41 PM, SEGALIS Morgan <ms...@gmail.com>
>> wrote:
>>
>>> Hi DuyHai,
>>>
>>> if there is 0 article, the row will obviously not exist I guess... (no
>>> article insertion will create the row)
>>> What is bugging you exactly ?
>>>
>>> 2015-01-22 20:33 GMT+01:00 DuyHai Doan <do...@gmail.com>:
>>>
>>>> Hello Morgan
>>>>
>>>>  The data model looks reasonable. Bucketing by day will help you to
>>>> scale. The only thing I can see is how to go back in time to fetch articles
>>>> from previous buckets (previous days). It is possible to have 0 article for
>>>> a country for a day ?
>>>>
>>>>
>>>> On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS Morgan <ms...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry, I copied/pasted the question from another platform where you
>>>>> don't generally say hello,
>>>>>
>>>>> So : Hello everyone,
>>>>>
>>>>>
>>>>> 2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <ms...@gmail.com>:
>>>>>
>>>>>> I have a column family that store articles. I'll need to get those
>>>>>> articles from the most recent to the oldest, getting them from Country, and
>>>>>> of course the ability to limit the number of fetched articles.
>>>>>>
>>>>>> I though about another ColumnFamily "ArticlesByDateAndCountry" with
>>>>>> dynamic columns
>>>>>>
>>>>>> The Key would a mix from the 2 Char country Code (ISO 3166-1), and
>>>>>> the articles day's date so something like : US-20150118 or FR-20141230 --
>>>>>> (XX-YYYYMMDD)
>>>>>>
>>>>>> In those Row, the column name would be the timeuuid of the article,
>>>>>> and the value is the article's ID.
>>>>>>
>>>>>> It would probably get a thousand of articles per day for each country.
>>>>>>
>>>>>> Let's say I want to show only 100 of the newer articles, I'll get the
>>>>>> today's articles, and if it does not fill the request (too few articles),
>>>>>> I'll check the day before that, etc...
>>>>>>
>>>>>> Is that the best practice, or does someone has a better idea for this
>>>>>> purpose ?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Morgan SEGALIS
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Morgan SEGALIS
>>>
>>
>>
>
>
> --
> Morgan SEGALIS
>

Re: Cassandra row ordering best practice Modeling

Posted by SEGALIS Morgan <ms...@gmail.com>.

Oh yeah, I though about it, even raised the reflexion on the first mail,

"Let's say I want to show only 100 of the newer articles, I'll get the
today's articles, and if it does not fill the request (too few articles),
I'll check the day before that, etc..."

but your answer raised another issue I did not though of before :
- going back on previous days, let's say I want 100 newest articles
- If there is at most 1 article per day, and some 0, I will have do more
100+ queries to get all the posts, won't it be a little too much ?

2015-01-22 20:47 GMT+01:00 DuyHai Doan <do...@gmail.com>:

> well, if the current day bucket does not contain enough article, you may
> need to search back in the previous day. If the previous day does not have
> any article, you may need to go back time a day before ... and so on ...
>
>  Of course it's a corner case but I've seen some code that misses this
> scenario and ends up in an infinite loop back in time ...
>
> On Thu, Jan 22, 2015 at 8:41 PM, SEGALIS Morgan <ms...@gmail.com>
> wrote:
>
>> Hi DuyHai,
>>
>> if there is 0 article, the row will obviously not exist I guess... (no
>> article insertion will create the row)
>> What is bugging you exactly ?
>>
>> 2015-01-22 20:33 GMT+01:00 DuyHai Doan <do...@gmail.com>:
>>
>>> Hello Morgan
>>>
>>>  The data model looks reasonable. Bucketing by day will help you to
>>> scale. The only thing I can see is how to go back in time to fetch articles
>>> from previous buckets (previous days). It is possible to have 0 article for
>>> a country for a day ?
>>>
>>>
>>> On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS Morgan <ms...@gmail.com>
>>> wrote:
>>>
>>>> Sorry, I copied/pasted the question from another platform where you
>>>> don't generally say hello,
>>>>
>>>> So : Hello everyone,
>>>>
>>>>
>>>> 2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <ms...@gmail.com>:
>>>>
>>>>> I have a column family that store articles. I'll need to get those
>>>>> articles from the most recent to the oldest, getting them from Country, and
>>>>> of course the ability to limit the number of fetched articles.
>>>>>
>>>>> I though about another ColumnFamily "ArticlesByDateAndCountry" with
>>>>> dynamic columns
>>>>>
>>>>> The Key would a mix from the 2 Char country Code (ISO 3166-1), and the
>>>>> articles day's date so something like : US-20150118 or FR-20141230 --
>>>>> (XX-YYYYMMDD)
>>>>>
>>>>> In those Row, the column name would be the timeuuid of the article,
>>>>> and the value is the article's ID.
>>>>>
>>>>> It would probably get a thousand of articles per day for each country.
>>>>>
>>>>> Let's say I want to show only 100 of the newer articles, I'll get the
>>>>> today's articles, and if it does not fill the request (too few articles),
>>>>> I'll check the day before that, etc...
>>>>>
>>>>> Is that the best practice, or does someone has a better idea for this
>>>>> purpose ?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Morgan SEGALIS
>>>>
>>>
>>>
>>
>>
>> --
>> Morgan SEGALIS
>>
>
>


-- 
Morgan SEGALIS

Re: Cassandra row ordering best practice Modeling

Posted by DuyHai Doan <do...@gmail.com>.

well, if the current day bucket does not contain enough article, you may
need to search back in the previous day. If the previous day does not have
any article, you may need to go back time a day before ... and so on ...

 Of course it's a corner case but I've seen some code that misses this
scenario and ends up in an infinite loop back in time ...

On Thu, Jan 22, 2015 at 8:41 PM, SEGALIS Morgan <ms...@gmail.com> wrote:

> Hi DuyHai,
>
> if there is 0 article, the row will obviously not exist I guess... (no
> article insertion will create the row)
> What is bugging you exactly ?
>
> 2015-01-22 20:33 GMT+01:00 DuyHai Doan <do...@gmail.com>:
>
>> Hello Morgan
>>
>>  The data model looks reasonable. Bucketing by day will help you to
>> scale. The only thing I can see is how to go back in time to fetch articles
>> from previous buckets (previous days). It is possible to have 0 article for
>> a country for a day ?
>>
>>
>> On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS Morgan <ms...@gmail.com>
>> wrote:
>>
>>> Sorry, I copied/pasted the question from another platform where you
>>> don't generally say hello,
>>>
>>> So : Hello everyone,
>>>
>>>
>>> 2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <ms...@gmail.com>:
>>>
>>>> I have a column family that store articles. I'll need to get those
>>>> articles from the most recent to the oldest, getting them from Country, and
>>>> of course the ability to limit the number of fetched articles.
>>>>
>>>> I though about another ColumnFamily "ArticlesByDateAndCountry" with
>>>> dynamic columns
>>>>
>>>> The Key would a mix from the 2 Char country Code (ISO 3166-1), and the
>>>> articles day's date so something like : US-20150118 or FR-20141230 --
>>>> (XX-YYYYMMDD)
>>>>
>>>> In those Row, the column name would be the timeuuid of the article, and
>>>> the value is the article's ID.
>>>>
>>>> It would probably get a thousand of articles per day for each country.
>>>>
>>>> Let's say I want to show only 100 of the newer articles, I'll get the
>>>> today's articles, and if it does not fill the request (too few articles),
>>>> I'll check the day before that, etc...
>>>>
>>>> Is that the best practice, or does someone has a better idea for this
>>>> purpose ?
>>>>
>>>
>>>
>>>
>>> --
>>> Morgan SEGALIS
>>>
>>
>>
>
>
> --
> Morgan SEGALIS
>

Re: Cassandra row ordering best practice Modeling

Posted by SEGALIS Morgan <ms...@gmail.com>.

Hi DuyHai,

if there is 0 article, the row will obviously not exist I guess... (no
article insertion will create the row)
What is bugging you exactly ?

2015-01-22 20:33 GMT+01:00 DuyHai Doan <do...@gmail.com>:

> Hello Morgan
>
>  The data model looks reasonable. Bucketing by day will help you to scale.
> The only thing I can see is how to go back in time to fetch articles from
> previous buckets (previous days). It is possible to have 0 article for a
> country for a day ?
>
>
> On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS Morgan <ms...@gmail.com>
> wrote:
>
>> Sorry, I copied/pasted the question from another platform where you don't
>> generally say hello,
>>
>> So : Hello everyone,
>>
>>
>> 2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <ms...@gmail.com>:
>>
>>> I have a column family that store articles. I'll need to get those
>>> articles from the most recent to the oldest, getting them from Country, and
>>> of course the ability to limit the number of fetched articles.
>>>
>>> I though about another ColumnFamily "ArticlesByDateAndCountry" with
>>> dynamic columns
>>>
>>> The Key would a mix from the 2 Char country Code (ISO 3166-1), and the
>>> articles day's date so something like : US-20150118 or FR-20141230 --
>>> (XX-YYYYMMDD)
>>>
>>> In those Row, the column name would be the timeuuid of the article, and
>>> the value is the article's ID.
>>>
>>> It would probably get a thousand of articles per day for each country.
>>>
>>> Let's say I want to show only 100 of the newer articles, I'll get the
>>> today's articles, and if it does not fill the request (too few articles),
>>> I'll check the day before that, etc...
>>>
>>> Is that the best practice, or does someone has a better idea for this
>>> purpose ?
>>>
>>
>>
>>
>> --
>> Morgan SEGALIS
>>
>
>


-- 
Morgan SEGALIS

Re: Cassandra row ordering best practice Modeling

Posted by DuyHai Doan <do...@gmail.com>.

Hello Morgan

 The data model looks reasonable. Bucketing by day will help you to scale.
The only thing I can see is how to go back in time to fetch articles from
previous buckets (previous days). It is possible to have 0 article for a
country for a day ?


On Thu, Jan 22, 2015 at 8:23 PM, SEGALIS Morgan <ms...@gmail.com> wrote:

> Sorry, I copied/pasted the question from another platform where you don't
> generally say hello,
>
> So : Hello everyone,
>
>
> 2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <ms...@gmail.com>:
>
>> I have a column family that store articles. I'll need to get those
>> articles from the most recent to the oldest, getting them from Country, and
>> of course the ability to limit the number of fetched articles.
>>
>> I though about another ColumnFamily "ArticlesByDateAndCountry" with
>> dynamic columns
>>
>> The Key would a mix from the 2 Char country Code (ISO 3166-1), and the
>> articles day's date so something like : US-20150118 or FR-20141230 --
>> (XX-YYYYMMDD)
>>
>> In those Row, the column name would be the timeuuid of the article, and
>> the value is the article's ID.
>>
>> It would probably get a thousand of articles per day for each country.
>>
>> Let's say I want to show only 100 of the newer articles, I'll get the
>> today's articles, and if it does not fill the request (too few articles),
>> I'll check the day before that, etc...
>>
>> Is that the best practice, or does someone has a better idea for this
>> purpose ?
>>
>
>
>
> --
> Morgan SEGALIS
>

Re: Cassandra row ordering best practice Modeling

Posted by SEGALIS Morgan <ms...@gmail.com>.

Sorry, I copied/pasted the question from another platform where you don't
generally say hello,

So : Hello everyone,


2015-01-22 20:19 GMT+01:00 SEGALIS Morgan <ms...@gmail.com>:

> I have a column family that store articles. I'll need to get those
> articles from the most recent to the oldest, getting them from Country, and
> of course the ability to limit the number of fetched articles.
>
> I though about another ColumnFamily "ArticlesByDateAndCountry" with
> dynamic columns
>
> The Key would a mix from the 2 Char country Code (ISO 3166-1), and the
> articles day's date so something like : US-20150118 or FR-20141230 --
> (XX-YYYYMMDD)
>
> In those Row, the column name would be the timeuuid of the article, and
> the value is the article's ID.
>
> It would probably get a thousand of articles per day for each country.
>
> Let's say I want to show only 100 of the newer articles, I'll get the
> today's articles, and if it does not fill the request (too few articles),
> I'll check the day before that, etc...
>
> Is that the best practice, or does someone has a better idea for this
> purpose ?
>



-- 
Morgan SEGALIS