You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Carlos Manuel Fernandes (DSI)" <ca...@cgd.pt> on 2016/10/03 18:25:42 UTC

ELT on Nifi

Hi all,

When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question is recurrent for the new users.

Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT  tools there are specific "processors"  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution.

But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.

Example of a ELT JSon Structure  (the "steps" inside  the "flow" are to be executed on PutSql in the same transaction)
{
       "Transformer": [{
             "name": "foo1",
             "type": "Map",
             "description": "Summarize the table foo from table bar",
             "flow": [{
                    "step": 1,
                    "description": "delete all data",
                    "stmt": "delete from  foo"
             }, {
                    "step": 2,
                    "Description": "Count f2 by f1",
                    "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
             }]
       }, {
             "name": "foo2",
             "type": "SCD- Slowly change Dimensions type 1",
             "description": "Update a prod table based on stage table",
             "flow": [{
                    "step": 1,
                    "description": "Process type 1",
                    "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
             }]
       }]
}

Example of a  NIFI template who execute that Json structure :

[cid:image002.png@01D21DAB.EE31A610]


This make sense?  Give me feedback.

Carlos




RE: ELT on Nifi

Posted by "Carlos Manuel Fernandes (DSI)" <ca...@cgd.pt>.
Matt ,

I think  its Nifi orientation to build just  “row” processors  which  are simple  in nature  but permits  a lot of complexity  together , and that away ELT processors seemed, at first time,  not to fit in this paradigm, because they have “side-effects” (reads/writes to the database).  That’s why I proposed to put the ELT logic outside of Nifi and use Nifi as a runtime Engine, the only side-effect reside in PutSql.

But, your points make me think again, I need  to realize how  this processors will look like, and  how they fit together.
Matt,  you can exemplify a  conceptual processor for example to replace-with-lookup? Which Properties, controllers,  Attributes in/out, and the function.

I think with an example,  we think better.

Thanks

Carlos




From: Matt Burgess [mailto:mattyb149@apache.org]
Sent: terça-feira, 4 de Outubro de 2016 03:49
To: users@nifi.apache.org
Subject: Re: ELT on Nifi

Carlos,

The extensible nature of NiFi, whether the overall architecture was intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of these kinds of operations are welcome (but possibly not yet present) in NiFi. Some might warrant framework changes, but for a good portion, many RDBMS/DW processors are possible but just haven't been added/contributed yet. In my experience, ETL/ELT tools have focused mainly on this kind of "processor" and in contrast can't handle the level of throughput, data formats, provenance/lineage, security, and/or data integrity that NiFi can. In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors available at this time. I see a few categories (please feel free to add/change/delete/discuss), mostly having to do with tabular (row-oriented, character-delimited) data:

1) Row-level operations. This includes projections (select fields from row), alter fields (change timestamp of column 'last_updated', e.g.), add column(s), replace-with-lookup, etc.
2) Table-level operations. This includes joins, grouping/aggregates, transposition, etc.
3) Composition/Application of the other two. This includes normalization & denormalization (star/snowflake schemas, e.g.), dimension updates (Kimball's SCD Type 2, e.g.), etc.
4) Bulk Loading. These usually involve custom code (although in many cases for NiFi you can deploy a command-line tool for bulk loading to a DB and use ExecuteProcess or ExecuteStreamCommand to make it happen). These are usually native processes for getting lots of data into the DB using an end-run around their own interfaces, possibly bypassing mechanisms that NiFi embraces, such as provenance. But they are often faster than their SQL interface counterparts for large data ingest.
5) Transactions. This involves executing a number of SQL statements as an atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support this (and many have their own dialects for such things).

That's a lot of feature surface to cover! Luckily we have an ever-growing community filled with folks representing a whole spectrum of experience and a shared passion for data :)  I am very interested in your thoughts on where NiFi could improve on these (or other) fronts with respect to ETL/ELT, I think we can get some good discussions (and code contributions!) going on this. Alternatively, if you'd like to pursue a discussion on how to offload data transformations, I'm sure the community has thoughts on that as well.

Regards,
Matt

P.S. I didn't include push-down optimization on the list because of its complexity and in NiFi terms involves things like dynamic flow-rewrites and other magic that IMHO is against the design principles of NiFi itself (simplicity, accountability, e.g.).

On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <ca...@cgd.pt>> wrote:
Hi all,

When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question is recurrent for the new users.

Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT  tools there are specific “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution.

But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.

Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be executed on PutSql in the same transaction)
{
       "Transformer": [{
             "name": "foo1",
             "type": "Map",
             "description": "Summarize the table foo from table bar",
             "flow": [{
                    "step": 1,
                    "description": "delete all data",
                    "stmt": "delete from  foo"
             }, {
                    "step": 2,
                    "Description": "Count f2 by f1",
                    "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
             }]
       }, {
             "name": "foo2",
             "type": "SCD- Slowly change Dimensions type 1",
             "description": "Update a prod table based on stage table",
             "flow": [{
                    "step": 1,
                    "description": "Process type 1",
                    "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
             }]
       }]
}

Example of a  NIFI template who execute that Json structure :

[cid:image001.png@01D21E72.33125470]


This make sense?  Give me feedback.

Carlos





RE: ELT on Nifi

Posted by "Carlos Manuel Fernandes (DSI)" <ca...@cgd.pt>.
Andy,

Good suggestion, i  will do that  , I had created several executeScript (in groovy) before.

Thanks

Carlos





From: Andy LoPresto [mailto:alopresto@apache.org]
Sent: sexta-feira, 7 de Outubro de 2016 18:21
To: users@nifi.apache.org
Subject: Re: ELT on Nifi

Carlos,

If you are comfortable with Groovy I would suggest you look at using ExecuteScript [1] processor to prototype what you want the processor to do. That processor will take an (inline or read from file) Groovy script and execute it within the processor lifecycle. Matt Burgess has written some excellent blog posts on getting started with it [2][3].

Once you have that behaving the way you like (and feel free to continue to ask questions here), another developer would probably be able to help you convert it to a “real" custom processor.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.ExecuteScript/index.html
[2] https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html
[3] https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html


Andy LoPresto
alopresto@apache.org<ma...@apache.org>
alopresto.apache@gmail.com<ma...@gmail.com>
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Oct 7, 2016, at 7:20 AM, João Henrique Freitas <jo...@gmail.com>> wrote:

Hi.
Maybe a linkedin/databus client processor could be created to handle ETL.

Em 06/10/2016 10:39, "Carlos Manuel Fernandes (DSI)" <ca...@cgd.pt>> escreveu:
Hi Uwe,

I saw you had developed similar approach of mine. Joe Witt lunched a challenge  to build a processor based on Json structure I proposed.

I think  we can use the code of convertJSONtoSQl processor as a template for this new processor.  This new processor will belong  to the category  - JSONtoSQL (the convertJSONtoSQL is the first one).

We can  work together to reach this goal but first we must agree on the Json structure for the input.

What you think?  You can contact me directly.

Thanks

Carlos

From: Uwe Geercken [mailto:uwe.geercken@web.de<ma...@web.de>]
Sent: terça-feira, 4 de Outubro de 2016 14:42
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Aw: Re: ELT on Nifi

Carlos,

I think that is a good point.

But I would like to bring up a little different view to it:

I have developed a business ruleengine (open source) written in Java and it is meanwhile in production at least at two bigger companies - they both use the Pentaho ETL tool together with the ruleengine. You can use the rules to filter/evaluate conditions and there are also actions which execute or transform data. The advantage is, that within Pentaho it is just a plugin and the business logic (or if you will also IT logic) it managed externally (through a web interface and possibly by users or superusers themselve and not by IT). This keeps a proper seperation of responsibilities of business logic and IT logic and the ETL process itself is much, much cleaner.

Likewise one could think of creating a plugin for Nifi which takes a similar approach: you have a processor that in the background calls the ruleengine. It runs and deliveres the results back to the process. Instead of having complex connections between transformation processors, which clutter the Nifi desktop there would be one processor for the ruleengine (of course also multiple ones).

In one of my later projects I have implemented the complete invoicing process for the company I work for using the ruleengine. The ETL is very clean and contains only IT logic (formatting of fields, splitting of fields, renaming, etc) and the rest is in external rule projects which contain the business logic.

My thinking is that the devision of responsibilities for the logic and a clean ETL or in the Nifi case a clean Flow diagram is a very strong argument for this approach.

Of course there is nothing to say against a mixed approach - custom processors and ruleengine - I just wanted to explain my point a little bit. Everything is available on github.com/uwegeercken<http://github.com/uwegeercken>.

I could write the Nifi code for the processor I guess, but I will need some help with testing, documentation and also packaging the nar file (I am not used to Maven and have struggled in the past to create a proper nar archive).

Greetings,

Uwe

Gesendet: Dienstag, 04. Oktober 2016 um 04:48 Uhr
Von: "Matt Burgess" <ma...@apache.org>>
An: users@nifi.apache.org<ma...@nifi.apache.org>
Betreff: Re: ELT on Nifi
Carlos,

The extensible nature of NiFi, whether the overall architecture was intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of these kinds of operations are welcome (but possibly not yet present) in NiFi. Some might warrant framework changes, but for a good portion, many RDBMS/DW processors are possible but just haven't been added/contributed yet. In my experience, ETL/ELT tools have focused mainly on this kind of "processor" and in contrast can't handle the level of throughput, data formats, provenance/lineage, security, and/or data integrity that NiFi can. In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors available at this time. I see a few categories (please feel free to add/change/delete/discuss), mostly having to do with tabular (row-oriented, character-delimited) data:

1) Row-level operations. This includes projections (select fields from row), alter fields (change timestamp of column 'last_updated', e.g.), add column(s), replace-with-lookup, etc.
2) Table-level operations. This includes joins, grouping/aggregates, transposition, etc.
3) Composition/Application of the other two. This includes normalization & denormalization (star/snowflake schemas, e.g.), dimension updates (Kimball's SCD Type 2, e.g.), etc.
4) Bulk Loading. These usually involve custom code (although in many cases for NiFi you can deploy a command-line tool for bulk loading to a DB and use ExecuteProcess or ExecuteStreamCommand to make it happen). These are usually native processes for getting lots of data into the DB using an end-run around their own interfaces, possibly bypassing mechanisms that NiFi embraces, such as provenance. But they are often faster than their SQL interface counterparts for large data ingest.
5) Transactions. This involves executing a number of SQL statements as an atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support this (and many have their own dialects for such things).

That's a lot of feature surface to cover! Luckily we have an ever-growing community filled with folks representing a whole spectrum of experience and a shared passion for data :)  I am very interested in your thoughts on where NiFi could improve on these (or other) fronts with respect to ETL/ELT, I think we can get some good discussions (and code contributions!) going on this. Alternatively, if you'd like to pursue a discussion on how to offload data transformations, I'm sure the community has thoughts on that as well.

Regards,
Matt

P.S. I didn't include push-down optimization on the list because of its complexity and in NiFi terms involves things like dynamic flow-rewrites and other magic that IMHO is against the design principles of NiFi itself (simplicity, accountability, e.g.).

On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <ca...@cgd.pt>> wrote:
Hi all,

When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question is recurrent for the new users.

Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT  tools there are specific “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution.

But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.

Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be executed on PutSql in the same transaction)
{
       "Transformer": [{
             "name": "foo1",
             "type": "Map",
             "description": "Summarize the table foo from table bar",
             "flow": [{
                    "step": 1,
                    "description": "delete all data",
                    "stmt": "delete from  foo"
             }, {
                    "step": 2,
                    "Description": "Count f2 by f1",
                    "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
             }]
       }, {
             "name": "foo2",
             "type": "SCD- Slowly change Dimensions type 1",
             "description": "Update a prod table based on stage table",
             "flow": [{
                    "step": 1,
                    "description": "Process type 1",
                    "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
             }]
       }]
}

Example of a  NIFI template who execute that Json structure :

<image001.png>


This make sense?  Give me feedback.

Carlos





Re: ELT on Nifi

Posted by Andy LoPresto <al...@apache.org>.
Carlos,

If you are comfortable with Groovy I would suggest you look at using ExecuteScript [1] processor to prototype what you want the processor to do. That processor will take an (inline or read from file) Groovy script and execute it within the processor lifecycle. Matt Burgess has written some excellent blog posts on getting started with it [2][3].

Once you have that behaving the way you like (and feel free to continue to ask questions here), another developer would probably be able to help you convert it to a “real" custom processor.

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.ExecuteScript/index.html
[2] https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html <https://funnifi.blogspot.com/2016/02/executescript-processor-hello-world.html>
[3] https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html <https://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html>


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 7, 2016, at 7:20 AM, João Henrique Freitas <jo...@gmail.com> wrote:
> 
> Hi.
> 
> Maybe a linkedin/databus client processor could be created to handle ETL.
> 
> 
> Em 06/10/2016 10:39, "Carlos Manuel Fernandes (DSI)" <carlos.antonio.fernandes@cgd.pt <ma...@cgd.pt>> escreveu:
> Hi Uwe,
> 
> 
> 
> I saw you had developed similar approach of mine. Joe Witt lunched a challenge  to build a processor based on Json structure I proposed.
> 
> 
> 
> I think  we can use the code of convertJSONtoSQl processor as a template for this new processor.  This new processor will belong  to the category  - JSONtoSQL (the convertJSONtoSQL is the first one).
> 
> 
> 
> We can  work together to reach this goal but first we must agree on the Json structure for the input.
> 
> 
> 
> What you think?  You can contact me directly.
> 
> 
> 
> Thanks
> 
> 
> 
> Carlos
> 
> 
> 
> From: Uwe Geercken [mailto:uwe.geercken@web.de <ma...@web.de>]
> Sent: terça-feira, 4 de Outubro de 2016 14:42
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: Aw: Re: ELT on Nifi
> 
> 
> 
> Carlos,
> 
> 
> 
> I think that is a good point.
> 
> 
> 
> But I would like to bring up a little different view to it:
> 
> 
> 
> I have developed a business ruleengine (open source) written in Java and it is meanwhile in production at least at two bigger companies - they both use the Pentaho ETL tool together with the ruleengine. You can use the rules to filter/evaluate conditions and there are also actions which execute or transform data. The advantage is, that within Pentaho it is just a plugin and the business logic (or if you will also IT logic) it managed externally (through a web interface and possibly by users or superusers themselve and not by IT). This keeps a proper seperation of responsibilities of business logic and IT logic and the ETL process itself is much, much cleaner.
> 
> 
> 
> Likewise one could think of creating a plugin for Nifi which takes a similar approach: you have a processor that in the background calls the ruleengine. It runs and deliveres the results back to the process. Instead of having complex connections between transformation processors, which clutter the Nifi desktop there would be one processor for the ruleengine (of course also multiple ones).
> 
> 
> 
> In one of my later projects I have implemented the complete invoicing process for the company I work for using the ruleengine. The ETL is very clean and contains only IT logic (formatting of fields, splitting of fields, renaming, etc) and the rest is in external rule projects which contain the business logic.
> 
> 
> 
> My thinking is that the devision of responsibilities for the logic and a clean ETL or in the Nifi case a clean Flow diagram is a very strong argument for this approach.
> 
> 
> 
> Of course there is nothing to say against a mixed approach - custom processors and ruleengine - I just wanted to explain my point a little bit. Everything is available on github.com/uwegeercken <http://github.com/uwegeercken>.
> 
> 
> 
> I could write the Nifi code for the processor I guess, but I will need some help with testing, documentation and also packaging the nar file (I am not used to Maven and have struggled in the past to create a proper nar archive).
> 
> 
> 
> Greetings,
> 
> 
> 
> Uwe
> 
> 
> 
> Gesendet: Dienstag, 04. Oktober 2016 um 04:48 Uhr
> Von: "Matt Burgess" <mattyb149@apache.org <ma...@apache.org>>
> An: users@nifi.apache.org <ma...@nifi.apache.org>
> Betreff: Re: ELT on Nifi
> 
> Carlos,
> 
> 
> 
> The extensible nature of NiFi, whether the overall architecture was intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of these kinds of operations are welcome (but possibly not yet present) in NiFi. Some might warrant framework changes, but for a good portion, many RDBMS/DW processors are possible but just haven't been added/contributed yet. In my experience, ETL/ELT tools have focused mainly on this kind of "processor" and in contrast can't handle the level of throughput, data formats, provenance/lineage, security, and/or data integrity that NiFi can. In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors available at this time. I see a few categories (please feel free to add/change/delete/discuss), mostly having to do with tabular (row-oriented, character-delimited) data:
> 
> 
> 
> 1) Row-level operations. This includes projections (select fields from row), alter fields (change timestamp of column 'last_updated', e.g.), add column(s), replace-with-lookup, etc.
> 
> 2) Table-level operations. This includes joins, grouping/aggregates, transposition, etc.
> 
> 3) Composition/Application of the other two. This includes normalization & denormalization (star/snowflake schemas, e.g.), dimension updates (Kimball's SCD Type 2, e.g.), etc.
> 
> 4) Bulk Loading. These usually involve custom code (although in many cases for NiFi you can deploy a command-line tool for bulk loading to a DB and use ExecuteProcess or ExecuteStreamCommand to make it happen). These are usually native processes for getting lots of data into the DB using an end-run around their own interfaces, possibly bypassing mechanisms that NiFi embraces, such as provenance. But they are often faster than their SQL interface counterparts for large data ingest.
> 
> 5) Transactions. This involves executing a number of SQL statements as an atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support this (and many have their own dialects for such things).
> 
> 
> 
> That's a lot of feature surface to cover! Luckily we have an ever-growing community filled with folks representing a whole spectrum of experience and a shared passion for data :)  I am very interested in your thoughts on where NiFi could improve on these (or other) fronts with respect to ETL/ELT, I think we can get some good discussions (and code contributions!) going on this. Alternatively, if you'd like to pursue a discussion on how to offload data transformations, I'm sure the community has thoughts on that as well.
> 
> 
> 
> Regards,
> 
> Matt
> 
> 
> 
> P.S. I didn't include push-down optimization on the list because of its complexity and in NiFi terms involves things like dynamic flow-rewrites and other magic that IMHO is against the design principles of NiFi itself (simplicity, accountability, e.g.).
> 
> 
> 
> On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <carlos.antonio.fernandes@cgd.pt <ma...@cgd.pt>> wrote:
> 
> Hi all,
> 
> 
> 
> When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question is recurrent for the new users.
> 
> 
> 
> Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT  tools there are specific “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution.
> 
> 
> 
> But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.
> 
> 
> 
> Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be executed on PutSql in the same transaction)
> 
> {
> 
>        "Transformer": [{
> 
>              "name": "foo1",
> 
>              "type": "Map",
> 
>              "description": "Summarize the table foo from table bar",
> 
>              "flow": [{
> 
>                     "step": 1,
> 
>                     "description": "delete all data",
> 
>                     "stmt": "delete from  foo"
> 
>              }, {
> 
>                     "step": 2,
> 
>                     "Description": "Count f2 by f1",
> 
>                     "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
> 
>              }]
> 
>        }, {
> 
>              "name": "foo2",
> 
>              "type": "SCD- Slowly change Dimensions type 1",
> 
>              "description": "Update a prod table based on stage table",
> 
>              "flow": [{
> 
>                     "step": 1,
> 
>                     "description": "Process type 1",
> 
>                     "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
> 
>              }]
> 
>        }]
> 
> }
> 
> 
> 
> Example of a  NIFI template who execute that Json structure :
> 
> 
> 
> <image001.png>
> 
> 
> 
> 
> 
> This make sense?  Give me feedback.
> 
> 
> 
> Carlos
> 
> 
> 
> 
> 
> 
> 


RE: Re: ELT on Nifi

Posted by João Henrique Freitas <jo...@gmail.com>.
Hi.

Maybe a linkedin/databus client processor could be created to handle ETL.

Em 06/10/2016 10:39, "Carlos Manuel Fernandes (DSI)" <
carlos.antonio.fernandes@cgd.pt> escreveu:

> Hi Uwe,
>
>
>
> I saw you had developed similar approach of mine. Joe Witt lunched a
> challenge  to build a processor based on Json structure I proposed.
>
>
>
> I think  we can use the code of convertJSONtoSQl processor as a template
> for this new processor.  This new processor will belong  to the category  -
> JSONtoSQL (the convertJSONtoSQL is the first one).
>
>
>
> We can  work together to reach this goal but first we must agree on the
> Json structure for the input.
>
>
>
> What you think?  You can contact me directly.
>
>
>
> Thanks
>
>
>
> Carlos
>
>
>
> *From:* Uwe Geercken [mailto:uwe.geercken@web.de]
> *Sent:* terça-feira, 4 de Outubro de 2016 14:42
> *To:* users@nifi.apache.org
> *Subject:* Aw: Re: ELT on Nifi
>
>
>
> Carlos,
>
>
>
> I think that is a good point.
>
>
>
> But I would like to bring up a little different view to it:
>
>
>
> I have developed a business ruleengine (open source) written in Java and
> it is meanwhile in production at least at two bigger companies - they both
> use the Pentaho ETL tool together with the ruleengine. You can use the
> rules to filter/evaluate conditions and there are also actions which
> execute or transform data. The advantage is, that within Pentaho it is just
> a plugin and the business logic (or if you will also IT logic) it managed
> externally (through a web interface and possibly by users or superusers
> themselve and not by IT). This keeps a proper seperation of
> responsibilities of business logic and IT logic and the ETL process itself
> is much, much cleaner.
>
>
>
> Likewise one could think of creating a plugin for Nifi which takes a
> similar approach: you have a processor that in the background calls the
> ruleengine. It runs and deliveres the results back to the process. Instead
> of having complex connections between transformation processors, which
> clutter the Nifi desktop there would be one processor for the ruleengine
> (of course also multiple ones).
>
>
>
> In one of my later projects I have implemented the complete invoicing
> process for the company I work for using the ruleengine. The ETL is very
> clean and contains only IT logic (formatting of fields, splitting of
> fields, renaming, etc) and the rest is in external rule projects which
> contain the business logic.
>
>
>
> My thinking is that the devision of responsibilities for the logic and a
> clean ETL or in the Nifi case a clean Flow diagram is a very strong
> argument for this approach.
>
>
>
> Of course there is nothing to say against a mixed approach - custom
> processors and ruleengine - I just wanted to explain my point a little bit.
> Everything is available on github.com/uwegeercken.
>
>
>
> I could write the Nifi code for the processor I guess, but I will need
> some help with testing, documentation and also packaging the nar file (I am
> not used to Maven and have struggled in the past to create a proper nar
> archive).
>
>
>
> Greetings,
>
>
>
> Uwe
>
>
>
> *Gesendet:* Dienstag, 04. Oktober 2016 um 04:48 Uhr
> *Von:* "Matt Burgess" <ma...@apache.org>
> *An:* users@nifi.apache.org
> *Betreff:* Re: ELT on Nifi
>
> Carlos,
>
>
>
> The extensible nature of NiFi, whether the overall architecture was
> intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of
> these kinds of operations are welcome (but possibly not yet present) in
> NiFi. Some might warrant framework changes, but for a good portion, many
> RDBMS/DW processors are possible but just haven't been added/contributed
> yet. In my experience, ETL/ELT tools have focused mainly on this kind of
> "processor" and in contrast can't handle the level of throughput, data
> formats, provenance/lineage, security, and/or data integrity that NiFi can.
> In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors
> available at this time. I see a few categories (please feel free to
> add/change/delete/discuss), mostly having to do with tabular (row-oriented,
> character-delimited) data:
>
>
>
> 1) Row-level operations. This includes projections (select fields from
> row), alter fields (change timestamp of column 'last_updated', e.g.), add
> column(s), replace-with-lookup, etc.
>
> 2) Table-level operations. This includes joins, grouping/aggregates,
> transposition, etc.
>
> 3) Composition/Application of the other two. This includes normalization &
> denormalization (star/snowflake schemas, e.g.), dimension updates
> (Kimball's SCD Type 2, e.g.), etc.
>
> 4) Bulk Loading. These usually involve custom code (although in many cases
> for NiFi you can deploy a command-line tool for bulk loading to a DB and
> use ExecuteProcess or ExecuteStreamCommand to make it happen). These are
> usually native processes for getting lots of data into the DB using an
> end-run around their own interfaces, possibly bypassing mechanisms that
> NiFi embraces, such as provenance. But they are often faster than their SQL
> interface counterparts for large data ingest.
>
> 5) Transactions. This involves executing a number of SQL statements as an
> atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support
> this (and many have their own dialects for such things).
>
>
>
> That's a lot of feature surface to cover! Luckily we have an ever-growing
> community filled with folks representing a whole spectrum of experience and
> a shared passion for data :)  I am very interested in your thoughts on
> where NiFi could improve on these (or other) fronts with respect to
> ETL/ELT, I think we can get some good discussions (and code contributions!)
> going on this. Alternatively, if you'd like to pursue a discussion on how
> to offload data transformations, I'm sure the community has thoughts on
> that as well.
>
>
>
> Regards,
>
> Matt
>
>
>
> P.S. I didn't include push-down optimization on the list because of its
> complexity and in NiFi terms involves things like dynamic flow-rewrites and
> other magic that IMHO is against the design principles of NiFi itself
> (simplicity, accountability, e.g.).
>
>
>
> On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <
> carlos.antonio.fernandes@cgd.pt> wrote:
>
> Hi all,
>
>
>
> When i saw Nifi for the first time , I try to build  a classical ETL/ELT
> flow , and this question is recurrent for the new users.
>
>
>
> Nifi has very good processors for the *Extract* and *Load*, the problem
> arise on Transform, because in ETL/ELT  tools there are specific
> “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes
> binded  to a specific database (ex: SCDNetezza) . The Transformer
> processors in Nifi  are general purpose  and not correlated with  this
> concepts. The immediate solution is to create a lot of Custom script
> processors but  the metadata of ELT (sql) turn attributes or code of
> processors, not an ideal solution.
>
>
>
> But, If we put  the logic of *Transform*  outside of Nifi, for example in
> some Json structure , then its relative easy, construct a ELT NIFI Template
> capable of run a generic ELT flows.
>
>
>
> Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be
> executed on PutSql in the same transaction)
>
> {
>
>        "Transformer": [{
>
>              "name": "foo1",
>
>              "type": "Map",
>
>              "description": "Summarize the table foo from table bar",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "delete all data",
>
>                     "stmt": "delete from  foo"
>
>              }, {
>
>                     "step": 2,
>
>                     "Description": "Count f2 by f1",
>
>                     "stmt": "insert into foo(c1, c2) select c1,sum(c2)
> from bar group by c1"
>
>              }]
>
>        }, {
>
>              "name": "foo2",
>
>              "type": "SCD- Slowly change Dimensions type 1",
>
>              "description": "Update a prod table based on stage table",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "Process type 1",
>
>                     "stmt": "Update Prod Set Prod.columns = Stage.Columns
> From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
>
>              }]
>
>        }]
>
> }
>
>
>
> Example of a  NIFI template who execute that Json structure :
>
>
>
>
>
>
>
> This make sense?  Give me feedback.
>
>
>
> Carlos
>
>
>
>
>
>
>

RE: Re: ELT on Nifi

Posted by "Carlos Manuel Fernandes (DSI)" <ca...@cgd.pt>.
Hi Uwe,

I saw you had developed similar approach of mine. Joe Witt lunched a challenge  to build a processor based on Json structure I proposed.

I think  we can use the code of convertJSONtoSQl processor as a template for this new processor.  This new processor will belong  to the category  - JSONtoSQL (the convertJSONtoSQL is the first one).

We can  work together to reach this goal but first we must agree on the Json structure for the input.

What you think?  You can contact me directly.

Thanks

Carlos

From: Uwe Geercken [mailto:uwe.geercken@web.de]
Sent: terça-feira, 4 de Outubro de 2016 14:42
To: users@nifi.apache.org
Subject: Aw: Re: ELT on Nifi

Carlos,

I think that is a good point.

But I would like to bring up a little different view to it:

I have developed a business ruleengine (open source) written in Java and it is meanwhile in production at least at two bigger companies - they both use the Pentaho ETL tool together with the ruleengine. You can use the rules to filter/evaluate conditions and there are also actions which execute or transform data. The advantage is, that within Pentaho it is just a plugin and the business logic (or if you will also IT logic) it managed externally (through a web interface and possibly by users or superusers themselve and not by IT). This keeps a proper seperation of responsibilities of business logic and IT logic and the ETL process itself is much, much cleaner.

Likewise one could think of creating a plugin for Nifi which takes a similar approach: you have a processor that in the background calls the ruleengine. It runs and deliveres the results back to the process. Instead of having complex connections between transformation processors, which clutter the Nifi desktop there would be one processor for the ruleengine (of course also multiple ones).

In one of my later projects I have implemented the complete invoicing process for the company I work for using the ruleengine. The ETL is very clean and contains only IT logic (formatting of fields, splitting of fields, renaming, etc) and the rest is in external rule projects which contain the business logic.

My thinking is that the devision of responsibilities for the logic and a clean ETL or in the Nifi case a clean Flow diagram is a very strong argument for this approach.

Of course there is nothing to say against a mixed approach - custom processors and ruleengine - I just wanted to explain my point a little bit. Everything is available on github.com/uwegeercken.

I could write the Nifi code for the processor I guess, but I will need some help with testing, documentation and also packaging the nar file (I am not used to Maven and have struggled in the past to create a proper nar archive).

Greetings,

Uwe

Gesendet: Dienstag, 04. Oktober 2016 um 04:48 Uhr
Von: "Matt Burgess" <ma...@apache.org>>
An: users@nifi.apache.org<ma...@nifi.apache.org>
Betreff: Re: ELT on Nifi
Carlos,

The extensible nature of NiFi, whether the overall architecture was intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of these kinds of operations are welcome (but possibly not yet present) in NiFi. Some might warrant framework changes, but for a good portion, many RDBMS/DW processors are possible but just haven't been added/contributed yet. In my experience, ETL/ELT tools have focused mainly on this kind of "processor" and in contrast can't handle the level of throughput, data formats, provenance/lineage, security, and/or data integrity that NiFi can. In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors available at this time. I see a few categories (please feel free to add/change/delete/discuss), mostly having to do with tabular (row-oriented, character-delimited) data:

1) Row-level operations. This includes projections (select fields from row), alter fields (change timestamp of column 'last_updated', e.g.), add column(s), replace-with-lookup, etc.
2) Table-level operations. This includes joins, grouping/aggregates, transposition, etc.
3) Composition/Application of the other two. This includes normalization & denormalization (star/snowflake schemas, e.g.), dimension updates (Kimball's SCD Type 2, e.g.), etc.
4) Bulk Loading. These usually involve custom code (although in many cases for NiFi you can deploy a command-line tool for bulk loading to a DB and use ExecuteProcess or ExecuteStreamCommand to make it happen). These are usually native processes for getting lots of data into the DB using an end-run around their own interfaces, possibly bypassing mechanisms that NiFi embraces, such as provenance. But they are often faster than their SQL interface counterparts for large data ingest.
5) Transactions. This involves executing a number of SQL statements as an atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support this (and many have their own dialects for such things).

That's a lot of feature surface to cover! Luckily we have an ever-growing community filled with folks representing a whole spectrum of experience and a shared passion for data :)  I am very interested in your thoughts on where NiFi could improve on these (or other) fronts with respect to ETL/ELT, I think we can get some good discussions (and code contributions!) going on this. Alternatively, if you'd like to pursue a discussion on how to offload data transformations, I'm sure the community has thoughts on that as well.

Regards,
Matt

P.S. I didn't include push-down optimization on the list because of its complexity and in NiFi terms involves things like dynamic flow-rewrites and other magic that IMHO is against the design principles of NiFi itself (simplicity, accountability, e.g.).

On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <ca...@cgd.pt>> wrote:
Hi all,

When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question is recurrent for the new users.

Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT  tools there are specific “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution.

But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.

Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be executed on PutSql in the same transaction)
{
       "Transformer": [{
             "name": "foo1",
             "type": "Map",
             "description": "Summarize the table foo from table bar",
             "flow": [{
                    "step": 1,
                    "description": "delete all data",
                    "stmt": "delete from  foo"
             }, {
                    "step": 2,
                    "Description": "Count f2 by f1",
                    "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
             }]
       }, {
             "name": "foo2",
             "type": "SCD- Slowly change Dimensions type 1",
             "description": "Update a prod table based on stage table",
             "flow": [{
                    "step": 1,
                    "description": "Process type 1",
                    "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
             }]
       }]
}

Example of a  NIFI template who execute that Json structure :

[cid:image001.png@01D21FDC.D46B1760]


This make sense?  Give me feedback.

Carlos




Re: ELT on Nifi

Posted by Matt Burgess <ma...@apache.org>.
Carlos,

The extensible nature of NiFi, whether the overall architecture was
intended for ETL/ELT and/or RDBMS/DW concepts or not, means that many of
these kinds of operations are welcome (but possibly not yet present) in
NiFi. Some might warrant framework changes, but for a good portion, many
RDBMS/DW processors are possible but just haven't been added/contributed
yet. In my experience, ETL/ELT tools have focused mainly on this kind of
"processor" and in contrast can't handle the level of throughput, data
formats, provenance/lineage, security, and/or data integrity that NiFi can.
In exchange, NiFi doesn't have as many of the RDBMS/DW-specific processors
available at this time. I see a few categories (please feel free to
add/change/delete/discuss), mostly having to do with tabular (row-oriented,
character-delimited) data:

1) Row-level operations. This includes projections (select fields from
row), alter fields (change timestamp of column 'last_updated', e.g.), add
column(s), replace-with-lookup, etc.
2) Table-level operations. This includes joins, grouping/aggregates,
transposition, etc.
3) Composition/Application of the other two. This includes normalization &
denormalization (star/snowflake schemas, e.g.), dimension updates
(Kimball's SCD Type 2, e.g.), etc.
4) Bulk Loading. These usually involve custom code (although in many cases
for NiFi you can deploy a command-line tool for bulk loading to a DB and
use ExecuteProcess or ExecuteStreamCommand to make it happen). These are
usually native processes for getting lots of data into the DB using an
end-run around their own interfaces, possibly bypassing mechanisms that
NiFi embraces, such as provenance. But they are often faster than their SQL
interface counterparts for large data ingest.
5) Transactions. This involves executing a number of SQL statements as an
atomic group (i.e. BEGIN, a bunch of INSERTs, COMMIT). Not all DBs support
this (and many have their own dialects for such things).

That's a lot of feature surface to cover! Luckily we have an ever-growing
community filled with folks representing a whole spectrum of experience and
a shared passion for data :)  I am very interested in your thoughts on
where NiFi could improve on these (or other) fronts with respect to
ETL/ELT, I think we can get some good discussions (and code contributions!)
going on this. Alternatively, if you'd like to pursue a discussion on how
to offload data transformations, I'm sure the community has thoughts on
that as well.

Regards,
Matt

P.S. I didn't include push-down optimization on the list because of its
complexity and in NiFi terms involves things like dynamic flow-rewrites and
other magic that IMHO is against the design principles of NiFi itself
(simplicity, accountability, e.g.).

On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <
carlos.antonio.fernandes@cgd.pt> wrote:

> Hi all,
>
>
>
> When i saw Nifi for the first time , I try to build  a classical ETL/ELT
> flow , and this question is recurrent for the new users.
>
>
>
> Nifi has very good processors for the *Extract* and *Load*, the problem
> arise on Transform, because in ETL/ELT  tools there are specific
> “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes
> binded  to a specific database (ex: SCDNetezza) . The Transformer
> processors in Nifi  are general purpose  and not correlated with  this
> concepts. The immediate solution is to create a lot of Custom script
> processors but  the metadata of ELT (sql) turn attributes or code of
> processors, not an ideal solution.
>
>
>
> But, If we put  the logic of *Transform*  outside of Nifi, for example in
> some Json structure , then its relative easy, construct a ELT NIFI Template
> capable of run a generic ELT flows.
>
>
>
> Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be
> executed on PutSql in the same transaction)
>
> {
>
>        "Transformer": [{
>
>              "name": "foo1",
>
>              "type": "Map",
>
>              "description": "Summarize the table foo from table bar",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "delete all data",
>
>                     "stmt": "delete from  foo"
>
>              }, {
>
>                     "step": 2,
>
>                     "Description": "Count f2 by f1",
>
>                     "stmt": "insert into foo(c1, c2) select c1,sum(c2)
> from bar group by c1"
>
>              }]
>
>        }, {
>
>              "name": "foo2",
>
>              "type": "SCD- Slowly change Dimensions type 1",
>
>              "description": "Update a prod table based on stage table",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "Process type 1",
>
>                     "stmt": "Update Prod Set Prod.columns = Stage.Columns
> From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
>
>              }]
>
>        }]
>
> }
>
>
>
> Example of a  NIFI template who execute that Json structure :
>
>
>
>
>
>
>
> This make sense?  Give me feedback.
>
>
>
> Carlos
>
>
>
>
>
>
>

RE: ELT on Nifi

Posted by "Carlos Manuel Fernandes (DSI)" <ca...@cgd.pt>.
Hi Joe,

I can contribute the Template , which image I send before. For build a processor , I’m not java skilled enough  for that task, I mostly program in Groovy .  If someone  take that task, I can help with ideas and tests.

Thanks

Carlos



From: Joe Witt [mailto:joe.witt@gmail.com]
Sent: terça-feira, 4 de Outubro de 2016 01:23
To: users@nifi.apache.org
Subject: Re: ELT on Nifi

Carlos,

I think you're right that more can be done to support a broad range of transforms and styles of transforms.  The approach you're suggesting makes sense for the style you prefer and I could envision such a processor that can execute the transform/statements you're showing in that JSON sample.  Are you proposing to contribute such a processor?

Thanks
Joe

On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <ca...@cgd.pt>> wrote:
Hi all,

When i saw Nifi for the first time , I try to build  a classical ETL/ELT flow , and this question is recurrent for the new users.

Nifi has very good processors for the Extract and Load, the problem arise on Transform, because in ETL/ELT  tools there are specific “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes binded  to a specific database (ex: SCDNetezza) . The Transformer processors in Nifi  are general purpose  and not correlated with  this concepts. The immediate solution is to create a lot of Custom script processors but  the metadata of ELT (sql) turn attributes or code of processors, not an ideal solution.

But, If we put  the logic of Transform  outside of Nifi, for example in some Json structure , then its relative easy, construct a ELT NIFI Template capable of run a generic ELT flows.

Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be executed on PutSql in the same transaction)
{
       "Transformer": [{
             "name": "foo1",
             "type": "Map",
             "description": "Summarize the table foo from table bar",
             "flow": [{
                    "step": 1,
                    "description": "delete all data",
                    "stmt": "delete from  foo"
             }, {
                    "step": 2,
                    "Description": "Count f2 by f1",
                    "stmt": "insert into foo(c1, c2) select c1,sum(c2) from bar group by c1"
             }]
       }, {
             "name": "foo2",
             "type": "SCD- Slowly change Dimensions type 1",
             "description": "Update a prod table based on stage table",
             "flow": [{
                    "step": 1,
                    "description": "Process type 1",
                    "stmt": "Update Prod Set Prod.columns = Stage.Columns From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
             }]
       }]
}

Example of a  NIFI template who execute that Json structure :

[cid:image001.png@01D21E64.24D94F70]


This make sense?  Give me feedback.

Carlos





Re: ELT on Nifi

Posted by Joe Witt <jo...@gmail.com>.
Carlos,

I think you're right that more can be done to support a broad range of
transforms and styles of transforms.  The approach you're suggesting makes
sense for the style you prefer and I could envision such a processor that
can execute the transform/statements you're showing in that JSON sample.
Are you proposing to contribute such a processor?

Thanks
Joe

On Mon, Oct 3, 2016 at 2:25 PM, Carlos Manuel Fernandes (DSI) <
carlos.antonio.fernandes@cgd.pt> wrote:

> Hi all,
>
>
>
> When i saw Nifi for the first time , I try to build  a classical ETL/ELT
> flow , and this question is recurrent for the new users.
>
>
>
> Nifi has very good processors for the *Extract* and *Load*, the problem
> arise on Transform, because in ETL/ELT  tools there are specific
> “processors”  (ex: map, SCD, etc.)  binded to DW concepts  and sometimes
> binded  to a specific database (ex: SCDNetezza) . The Transformer
> processors in Nifi  are general purpose  and not correlated with  this
> concepts. The immediate solution is to create a lot of Custom script
> processors but  the metadata of ELT (sql) turn attributes or code of
> processors, not an ideal solution.
>
>
>
> But, If we put  the logic of *Transform*  outside of Nifi, for example in
> some Json structure , then its relative easy, construct a ELT NIFI Template
> capable of run a generic ELT flows.
>
>
>
> Example of a ELT JSon Structure  (the “steps” inside  the “flow” are to be
> executed on PutSql in the same transaction)
>
> {
>
>        "Transformer": [{
>
>              "name": "foo1",
>
>              "type": "Map",
>
>              "description": "Summarize the table foo from table bar",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "delete all data",
>
>                     "stmt": "delete from  foo"
>
>              }, {
>
>                     "step": 2,
>
>                     "Description": "Count f2 by f1",
>
>                     "stmt": "insert into foo(c1, c2) select c1,sum(c2)
> from bar group by c1"
>
>              }]
>
>        }, {
>
>              "name": "foo2",
>
>              "type": "SCD- Slowly change Dimensions type 1",
>
>              "description": "Update a prod table based on stage table",
>
>              "flow": [{
>
>                     "step": 1,
>
>                     "description": "Process type 1",
>
>                     "stmt": "Update Prod Set Prod.columns = Stage.Columns
> From Stage Inner Join Prod on Stage.key = Prod.key Where Stage.IsType1 = 1 "
>
>              }]
>
>        }]
>
> }
>
>
>
> Example of a  NIFI template who execute that Json structure :
>
>
>
>
>
>
>
> This make sense?  Give me feedback.
>
>
>
> Carlos
>
>
>
>
>
>
>