You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Marcelo Valle (BLOOMBERG/ LONDON)" <mv...@bloomberg.net> on 2015/02/04 16:50:54 UTC

to normalize or not to normalize - read penalty vs write penalty

Hello everyone,

I am thinking about the architecture of my application using Cassandra and I am asking myself if I should or shouldn't normalize an entity.

I have users and alerts in my application and for each user, several alerts. The first model which came into my mind was creating an "alerts" CF with user-id as part of the partition key. This way, I can have fast writes and my reads will be fast too, as I will always read per partition.

However, I received a requirement later that made my life more complicated. Alerts can be shared by 1000s of users and alerts can change. I am building a real time app and if I change an alert, all users related to it should see the change.

Suppose I want to keep thing not normalized - always an alert changes I would need to do a write on 1000s of records. This way my write performance everytime I change an alert would be affected.

On the other hand, I could have a CF for users-alerts and another for alert details. Then, at read time, I would need to query 1000s of alerts for a given user.

In both situations, there is a gap between the time data is written and the time it's available to be read.

I understand not normalizing will make me use more disk space, but once data is written once, I will be able to perform as many reads as I want to with no penalty in performance. Also, I understand writes are faster than reads in Cassandra, so the gap would be smaller in the first solution.

I would be glad in hearing thoughts from the community.

Best regards,
Marcelo Valle.

Re: to normalize or not to normalize - read penalty vs write penalty

Posted by Tyler Hobbs <ty...@datastax.com>.

Roughly how often do you expect to update alerts?  How often do you expect
to read the alerts?  I suspect you'll be doing 100x more reads (or more),
in which case optimizing for reads is the definitely right choice.

On Wed, Feb 4, 2015 at 9:50 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemilita@bloomberg.net> wrote:

> Hello everyone,
>
> I am thinking about the architecture of my application using Cassandra and
> I am asking myself if I should or shouldn't normalize an entity.
>
> I have users and alerts in my application and for each user, several
> alerts. The first model which came into my mind was creating an "alerts" CF
> with user-id as part of the partition key. This way, I can have fast writes
> and my reads will be fast too, as I will always read per partition.
>
> However, I received a requirement later that made my life more
> complicated. Alerts can be shared by 1000s of users and alerts can change.
> I am building a real time app and if I change an alert, all users related
> to it should see the change.
>
> Suppose I want to keep thing not normalized - always an alert changes I
> would need to do a write on 1000s of records. This way my write performance
> everytime I change an alert would be affected.
>
> On the other hand, I could have a CF for users-alerts and another for
> alert details. Then, at read time, I would need to query 1000s of alerts
> for a given user.
>
> In both situations, there is a gap between the time data is written and
> the time it's available to be read.
>
> I understand not normalizing will make me use more disk space, but once
> data is written once, I will be able to perform as many reads as I want to
> with no penalty in performance. Also, I understand writes are faster than
> reads in Cassandra, so the gap would be smaller in the first solution.
>
> I would be glad in hearing thoughts from the community.
>
> Best regards,
> Marcelo Valle.
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>