You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "K. Rukshan Viduranga Perera (JIRA)" <ji...@apache.org> on 2016/02/12 04:56:18 UTC

[jira] [Commented] (SAMZA-200) Explore using MySQL changelog as input stream

    [ https://issues.apache.org/jira/browse/SAMZA-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143984#comment-15143984 ] 

K. Rukshan Viduranga Perera commented on SAMZA-200:
---------------------------------------------------

Is this still open. I would like to look into this as a GSoC project.

> Explore using MySQL changelog as input stream
> ---------------------------------------------
>
>                 Key: SAMZA-200
>                 URL: https://issues.apache.org/jira/browse/SAMZA-200
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Martin Kleppmann
>              Labels: gsoc2015, java, project
>
> Samza is designed with good support for database changelogs, but the current open source release is mostly centered around Kafka. It would be good to have out-of-the-box support for some common databases, such as MySQL, as well.
> [Databus|http://www.socc2012.org/s18-das.pdf?attredirects=0] is LinkedIn's change capture tool, but the current open source release focuses mainly on Oracle. There is an open source release of [Databus for MySQL|https://github.com/linkedin/databus/wiki/Databus-for-MySQL], but it's a proof-of-concept implementation, not the one used by LinkedIn in production. (The one used by LinkedIn requires a patched version of MySQL.) The open source Databus uses [Open Replicator|https://code.google.com/p/open-replicator/] to connect to a MySQL server as a slave, and parses the binlog to find any inserts, updates or deletes.
> I played around a bit with Open Replicator today, and got it working — a small Scala program that could get a real-time feed of all changes happening in a MySQL database. However, I have some doubts about the quality of the library (the code is not very good, it has only very cursory tests, the original maintainer hasn't touched it for 18 months, and there are reports of nasty bugs -- eg. blowing up on any negative number). There don't seem to be any better Java binlog parsers out there. But I did skim the source of Open Replicator, and it's not too complicated -- it seems quite feasible to write a MySQL binlog parser ourselves.
> This is still very much at exploratory stage, but I think it could be really cool to have database changelog support easily available in Samza.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)