You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Andrew Prudhomme <as...@yelp.com> on 2018/01/31 00:56:43 UTC

CDC usability and future development

Hi all,

We are currently designing a system that allows our Cassandra clusters to
produce a stream of data updates. Naturally, we have been evaluating if CDC
can aid in this endeavor. We have found several challenges in using CDC for
this purpose.

CDC provides only the mutation as opposed to the full column value, which
tends to be of limited use for us. Applications might want to know the full
column value, without having to issue a read back. We also see value in
being able to publish the full column value both before and after the
update. This is especially true when deleting a column since this stream
may be joined with others, or consumers may require other fields to
properly process the delete.

Additionally, there is some difficulty with processing CDC itself such as:
- Updates not being immediately available (addressed by CASSANDRA-12148)
- Each node providing an independent streams of updates that must be
unified and deduplicated

Our question is, what is the vision for CDC development? The current
implementation could work for some use cases, but is a ways from a general
streaming solution. I understand that the nature of Cassandra makes this
quite complicated, but are there any thoughts or desires on the future
direction of CDC?

Thanks