You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by anil_ah <an...@yahoo.co.in> on 2016/03/02 02:11:53 UTC

DATA replication from Oracle DB to Cassandra

    
Hi    I want to run spark job to do incremental sync from oracle to cassandra,job interval could be one minute.we are looking for a real time replication with latency of 1 or 2 min.
Please advise  what would be best Approch
1)oracle db->spark sql ->spark->cassandra.2)oracle db ->sqoop->cassandra 
Please advise which option is good in term of scalable,incremental etc
Regards Anil


Sent from my Samsung device

Re: DATA replication from Oracle DB to Cassandra

Posted by Hannu Kröger <hk...@gmail.com>.
Hi,

I have implemented once one way replication from a RDBMS to Cassandra using triggers in the source database side. If you timestamp the changes from the source, it’s possible to timestamp them on the cassandra side as well and that takes care of a lot of ordering of the changes. Assuming that your data model doesn’t change too much.

In practise:
- Triggers push change events to a commit log and that is pushed to a queue
- Readers on Cassandra side reads to events from the queue and write them in cassandra with the timestamp from the change event
- Cassandra handles ordering of change events

Using timestamps you can resend changes, read in events in any order, etc. If you screw up the replication somehow (we did many times), it was easy to just create a dump on the source and load that in again with timestamps so that the system was running all the time.

This way it’s possible to achieve quite low latency (seconds, not minutes) for the replication.

Cheers,
Hannu

> On 02 Mar 2016, at 03:11, anil_ah <an...@yahoo.co.in> wrote:
> 
> Hi 
>    I want to run spark job to do incremental sync from oracle to cassandra,job interval could be one minute.we are looking for a real time replication with latency of 1 or 2 min.
> 
> Please advise  what would be best Approch
> 
> 1)oracle db->spark sql ->spark->cassandra.
> 2)oracle db ->sqoop->cassandra 
> 
> Please advise which option is good in term of scalable,incremental etc
> 
> Regards 
> Anil
> 
> 
> 
> Sent from my Samsung device