You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Boris Tyukin <bo...@boristyukin.com> on 2018/07/24 13:56:33 UTC
processing a bunch of rules in NiFi in real-time - do I need
distributed cache?
Hi guys,
I could really use an advice. We need to replicate 300 tables from 3 Oracle
DBs into Apache Kudu. I'm thinking about doing this:
OracleDB1 -->
OracleDB2 --> Oracle GoldenGate --> Kafka --> NiFi 3 node cluster --> Kudu
OracleDB3 -->
GoldenGate will stream changes in 300 tables in near real-time to Kafka (I
am not sure if I want to use a single topic or one per table).
NiFi will process messages from Kafka and upsert data into Kudu.
Now to my question/challenge. Every table has a bunch of columns and I need
to apply certain rules before I move records into Kudu:
1) adjust timezone for certain columns
2) convert data types
3) merge records from 3 Oracle DBs into one single table and order final
table columns in a specific way
4) drop certain columns
Since there are 900 tables total involved, I do not want to hardcode any of
these rules and I already built a little tool, that produces rules for
table columns based on some logic.
I store these rules in MySQL database.
Technically I can now use NiFi, listen to Kafka topic/s, and fetch these
rules from MySQL tables and then build dynamically some logic in NiFi
(either using newer Record processors or Groovy scripting) and form a Kudu
upsert statement (or using PutKudu processor).
My concern though since there are 900 tables and the pipeline runs in
real-time, I have to make tons of requests to our MySQL database.
I am wondering if it would work or I need to use some caching solution with
NiFi (Redis?) to store these rules for NiFi.
Appreciate any guidance!
Boris