You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Bimal Mehta <bi...@gmail.com> on 2019/08/26 16:30:35 UTC

Variables to start a NiFi flow

Hi,

We have a data flow which extracts data from source database table and
loads into target hive table. This flow needs to  run several times in a
day to get delta records from source table and also for multiple tables .
Now we need to replicate this same process for all the different source
tables. So rather than creating multiple data flows for each separate
table, can I use the existing flow and pass parameters like source table
name to that flow and the flow starts. Basically looking for an interface
where the user can pass the table names that we want to load at a given
point in time  and the flow is triggered for that table. Variable Registry
comes to mind, but I am not sure how to make it work for this use case. We
are using NiFi 1.9.0  as part of CDF bundle.

Thanks
Bimal Mehta

Re: Variables to start a NiFi flow

Posted by Bimal Mehta <bi...@gmail.com>.

Thanks Peter.
The generate flow file option was a good fit for our case. We are also able
to trigger it using a shell script by using curl.


On Tue, Aug 27, 2019, 1:45 AM Peter Turcsanyi <tu...@cloudera.com>
wrote:

> Hi Bimal,
>
> With Variable Registry, you can implement it in the following way:
> Put your flow into a Process Group. Use variable references in your
> processors (eg. ${db.table}) and define the variables at the process group
> level. Then copy the process group (by simply copying it or creating a
> template from it first) and set the variables to the proper values in each
> process group. You can also configure separate scheduling in each process
> group.
> The drawback that you need to multiply your flow.
>
> Another approach:
> Defined your flow only once and use FlowFile attributes instead of
> variables in variable registry.
> Use GenerateFlowFile and add the FlowFile attributes via the dynamic
> properties of this processor. Configure a separate GenerateFlowFile for
> each of your source tables and connect them to the same "SQL" processor
> (which was the entry point earlier). Configure the scheduling on these
> GenerateFlowFile-s.
> The problem is that not all "SQL" processors support flowfile input. You
> can use ExecuteSQL(Record) or GenerateTableFetch in this way, but not
> QueryDatabaseTable.
>
> Regards,
> Peter
>
> On Mon, Aug 26, 2019 at 6:30 PM Bimal Mehta <bi...@gmail.com> wrote:
>
>> Hi,
>>
>> We have a data flow which extracts data from source database table and
>> loads into target hive table. This flow needs to  run several times in a
>> day to get delta records from source table and also for multiple tables .
>> Now we need to replicate this same process for all the different source
>> tables. So rather than creating multiple data flows for each separate
>> table, can I use the existing flow and pass parameters like source table
>> name to that flow and the flow starts. Basically looking for an interface
>> where the user can pass the table names that we want to load at a given
>> point in time  and the flow is triggered for that table. Variable Registry
>> comes to mind, but I am not sure how to make it work for this use case. We
>> are using NiFi 1.9.0  as part of CDF bundle.
>>
>> Thanks
>> Bimal Mehta
>>
>

Re: Variables to start a NiFi flow

Posted by Peter Turcsanyi <tu...@cloudera.com>.

Hi Bimal,

With Variable Registry, you can implement it in the following way:
Put your flow into a Process Group. Use variable references in your
processors (eg. ${db.table}) and define the variables at the process group
level. Then copy the process group (by simply copying it or creating a
template from it first) and set the variables to the proper values in each
process group. You can also configure separate scheduling in each process
group.
The drawback that you need to multiply your flow.

Another approach:
Defined your flow only once and use FlowFile attributes instead of
variables in variable registry.
Use GenerateFlowFile and add the FlowFile attributes via the dynamic
properties of this processor. Configure a separate GenerateFlowFile for
each of your source tables and connect them to the same "SQL" processor
(which was the entry point earlier). Configure the scheduling on these
GenerateFlowFile-s.
The problem is that not all "SQL" processors support flowfile input. You
can use ExecuteSQL(Record) or GenerateTableFetch in this way, but not
QueryDatabaseTable.

Regards,
Peter

On Mon, Aug 26, 2019 at 6:30 PM Bimal Mehta <bi...@gmail.com> wrote:

> Hi,
>
> We have a data flow which extracts data from source database table and
> loads into target hive table. This flow needs to  run several times in a
> day to get delta records from source table and also for multiple tables .
> Now we need to replicate this same process for all the different source
> tables. So rather than creating multiple data flows for each separate
> table, can I use the existing flow and pass parameters like source table
> name to that flow and the flow starts. Basically looking for an interface
> where the user can pass the table names that we want to load at a given
> point in time  and the flow is triggered for that table. Variable Registry
> comes to mind, but I am not sure how to make it work for this use case. We
> are using NiFi 1.9.0  as part of CDF bundle.
>
> Thanks
> Bimal Mehta
>

Re: Variables to start a NiFi flow

Posted by Bronislav Jitnikov <sh...@gmail.com>.

Variable Registry it is not what you need as I think. In similar case I
generate flow files with tables in Atribute that I need to scan and pass it
as upstream to SQL Processor.
For example.
GenerateFlowFile (list of tables in content) (Or Read list of tables from
File, Varibale registry etc..)
SplitContent (To individual tables)
Get content with table name to Attribute
and put it to your PG as Upstream to SQL


пн, 26 авг. 2019 г. в 19:30, Bimal Mehta <bi...@gmail.com>:

> Hi,
>
> We have a data flow which extracts data from source database table and
> loads into target hive table. This flow needs to  run several times in a
> day to get delta records from source table and also for multiple tables .
> Now we need to replicate this same process for all the different source
> tables. So rather than creating multiple data flows for each separate
> table, can I use the existing flow and pass parameters like source table
> name to that flow and the flow starts. Basically looking for an interface
> where the user can pass the table names that we want to load at a given
> point in time  and the flow is triggered for that table. Variable Registry
> comes to mind, but I am not sure how to make it work for this use case. We
> are using NiFi 1.9.0  as part of CDF bundle.
>
> Thanks
> Bimal Mehta
>