You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Márcio Sugar <fa...@ymail.com> on 2019/09/23 15:40:52 UTC

Setup x Ingestion tasks? Or how to create structures (tables) and load data (rows) in the same flow?

Hi,

I'm trying to replicate a number of tables from one database to another. I'd like the flow to take care of both the DDL ("create table if not exists ...") and DML ("insert") commands automatically. Ideally, the "create table" should be executed just once, before any insert for that same table is executed. 
I can use a Distributed Map Cache to know if a "create table" for each table was already performed or not, but the problem is that I don't know how to hold the "inserts" for that table until the "create table" is done. 
I'm using "crate table if not exists as select * from ...", so I'm trying to create the table and populate it at the same time with the data from that first row. It's not a pure "create table" without data because I couldn't find any processor that automatically maps the avro.schema to my database's DDL. I could use ExecuteScript for that, and then use "create table if not exists <table definition>", but how to avoid running the create for every single row (or even for every flowfile containing many rows each)? It would be great if I could run the "create table" just once per table, with or without data for the first "batch".
It looks like a Setup task, if you know what I mean. I'm not sure if that is something that would fit how NiFi works, though. Wait and Notify don't look like an answer, either.
Probably I'd be better off considering the creation of the table structures as a one-time configuration task performed before the flow is first executed, but it would be cool to have everything automated using the same toolset, specially considering that new tables could be created at any time. (You may assume mine is a test database so I don't actually need or want to enforce a more strict control on what is going to be created or not. I just want it there, and maybe be notified when something new comes up.) 
Suggestions?
Thank you,

Marcio