You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Max <na...@gmail.com> on 2019/04/04 12:47:27 UTC

Reusing same flow for different database connections

Hello!

We are working on a project that requires importing data from tables across
different database servers (as in, different db connection pools.)

The data flow itself is the same across maybe 40-50 tables and around 10
connections. I tried to create an abstract flow that can be parameterized
by an incoming flow file in order to avoid duplication. My flow works like
this:

- A .json file is written to a folder. It contains something like this:
{"target": "some_table", "source": "other_table", "query": "SELECT * FROM
other_table", ...}
- We convert the json key/value pairs to flow file attributes
- Select the data as records from the source table using the query attribute
- Store the data in bulk in the target table

This works well since we can use the parameters from the .json file in the
processors of the flow, I don't need to hardcode table names or the query
in the processors.

Where this approach breaks down: I can't parameterize the database
connection/the connection pool name. So in the end I would need to
duplicate the same flow 10x for each database.

Maybe I'm approaching this from the wrong direction, is there a better way
to do what I want?

Max

Re: Reusing same flow for different database connections

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

Take a look at the DBCP lookup service, it allows you to register one or
more connection pool services and then select one at runtime based on an
incoming flow file having an attribute called database.name.

Thanks,

Bryan

On Thu, Apr 4, 2019 at 8:47 AM Max <na...@gmail.com> wrote:

> Hello!
>
> We are working on a project that requires importing data from tables
> across different database servers (as in, different db connection pools.)
>
> The data flow itself is the same across maybe 40-50 tables and around 10
> connections. I tried to create an abstract flow that can be parameterized
> by an incoming flow file in order to avoid duplication. My flow works like
> this:
>
> - A .json file is written to a folder. It contains something like this:
> {"target": "some_table", "source": "other_table", "query": "SELECT * FROM
> other_table", ...}
> - We convert the json key/value pairs to flow file attributes
> - Select the data as records from the source table using the query
> attribute
> - Store the data in bulk in the target table
>
> This works well since we can use the parameters from the .json file in the
> processors of the flow, I don't need to hardcode table names or the query
> in the processors.
>
> Where this approach breaks down: I can't parameterize the database
> connection/the connection pool name. So in the end I would need to
> duplicate the same flow 10x for each database.
>
> Maybe I'm approaching this from the wrong direction, is there a better way
> to do what I want?
>
> Max
>
-- 
Sent from Gmail Mobile