You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Boris Tyukin <bo...@boristyukin.com> on 2018/07/24 13:56:33 UTC

processing a bunch of rules in NiFi in real-time - do I need distributed cache?

Hi guys,

I could really use an advice. We need to replicate 300 tables from 3 Oracle
DBs into Apache Kudu. I'm thinking about doing this:

OracleDB1 -->
OracleDB2 --> Oracle GoldenGate --> Kafka --> NiFi 3 node cluster --> Kudu
OracleDB3 -->

GoldenGate will stream changes in 300 tables in near real-time to Kafka (I
am not sure if I want to use a single topic or one per table).

NiFi will process messages from Kafka and upsert data into Kudu.

Now to my question/challenge. Every table has a bunch of columns and I need
to apply certain rules before I move records into Kudu:
1) adjust timezone for certain columns
2) convert data types
3) merge records from 3 Oracle DBs into one single table and  order final
table columns in a specific way
4) drop certain columns

Since there are 900 tables total involved, I do not want to hardcode any of
these rules and I already built a little tool, that produces rules for
table columns based on some logic.

I store these rules in MySQL database.

Technically I can now use NiFi, listen to Kafka topic/s, and fetch these
rules from MySQL tables and then build dynamically some logic in NiFi
(either using newer Record processors or Groovy scripting) and form a Kudu
upsert statement (or using PutKudu processor).

My concern though since there are 900 tables and the pipeline runs in
real-time, I have to make tons of requests to our MySQL database.

I am wondering if it would work or I need to use some caching solution with
NiFi (Redis?) to store these rules for NiFi.

Appreciate any guidance!

Boris