You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ziju feng <pk...@gmail.com> on 2014/11/09 11:32:02 UTC

Plan to implement server side synchronization of denormalized data ?

Hi all,

I was wondering if there is any plan to support syncing change
automatically between entity table and tables that contain denormalized
data on server side?

I think many use cases in Cassandra require some level of denormalization.
However, there is currently little support for denormalization from server
side. Denormalization has to be done by driver or even application, which
leads to two issues:

1. Application complexity: As far as I know, there is no drivers support
propagating changes of main entity to denormalized ones, user will have to
handle data synchronization themselves. There can be a lot of codes to
write and it's quite hard to get it done right, considering things like
what consistency level to use, sync vs async update, reverse index table,
etc.

2. Data consistency: Suppose there is an entity table:
Create table entity(
  id text primary key,
  name text,
  value text)
and an index table for 'name', which also stores 'value' for
denormalization:
Create table name_idx(
  name text,
  id text,
  value text)
When a request to update 'value' is sent to the application, it needs to
update both entity and name_idx tables. Suppose another request to update
'name' is sent at the same time, the application will need to delete the
original row from name_idx and create a new row based on the new name.
However, if the 1st request read (it has to retrieve the value of 'name' in
order to update name_idx) before the 2nd request finishes, its update
statement will generate a row in name_idx with the original name, which
leads to inconsistent data. CAS may help here, but when the number of
concurrent requests is large and there are more index tables, CAS could
fail frequently.

Since secondary index has limitation in both performance and query
flexibility (no order by, for example), it can help the application a lot
if Cassandra support server maintained (just like the secondary index)
index tables on a main table.

One possible syntax can be 'CREATE VIEW view_name ON table_name' and assume
each column in the view would have the same name as in the main table as
convention, so that user can create different views based on their query
requirements.

Thanks,

Ziju