You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (JIRA)" <ji...@apache.org> on 2017/04/15 03:10:41 UTC

[jira] [Reopened] (NIFI-3413) Implement a CaptureChangeMySQL processor

     [ https://issues.apache.org/jira/browse/NIFI-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Burgess reopened NIFI-3413:
--------------------------------

The MySQL binlog event for inserting new rows is called a "write", but in generic SQL/RDBMS lingo it is more likely referred to as an "insert". Reopening this case to change the external facing properties/attributes/etc to "insert" vs "write".

> Implement a CaptureChangeMySQL processor
> ----------------------------------------
>
>                 Key: NIFI-3413
>                 URL: https://issues.apache.org/jira/browse/NIFI-3413
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.2.0
>
>
> Database systems such as MySQL, Oracle, and SQL Server allow access to their transactional logs and such, in order for external clients to have a "change data capture" (CDC) capability. As an initial effort, I propose a CaptureChangeMySQL processor to enable this in NiFi. This would incorporate any APIs necessary for follow-on Jira cases to implement CDC processors for databases such as Oracle, SQL Server, PostgreSQL, etc.
> The processor would include properties needed for database connectivity (unless using a DBCPConnectionPool would suffice), as well as any to configure third-party clients (mysql-binlog-connector, e.g.). It would also need to keep a "sequence ID" such that an EnforceOrder processor (NIFI-3414) for example could guarantee the order of CDC events for use cases such as replication. It will likely need State Management for that, and may need other facilities such as a DistributedMapCache in order to keep information (column names and types, e.g.) that enrich the raw CDC events.
> The processor would accept no incoming connections (it is a "get" or source processor), would be intended to run on the primary node only as a single threaded processor, and would generate a flow file for each operation (INSERT, UPDATE, DELETE, e,g,) in one or some number of formats (JSON, e.g.).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)