You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Paul Gibeault (pagibeault)" <pa...@micron.com> on 2018/08/17 14:08:57 UTC

RE: [EXT] Design pattern advice needed

Bob,
  Even if you were able to manually create the NiFi flow for all 200 tables successfully, you may want to make some change to the flows down the road.  You would have to perform 200 manual changes again.

This may be mitigated slightly by breaking the flows into 2 pieces: Acquisition and Ingestion.  You would have 200 unique Acquisition flows and a single ingestion flow.  You will have to rely on Process Group variables and flow file attributes to accomplish this.

I think your answer is automation.  NiFi provides a REST API<https://nifi.apache.org/docs/nifi-docs/rest-api/index.html> for any action you can take in the UI.  This means you can write automation software to analyze the data sources and dynamically build the NiFi Flows.

You do not have to start from scratch, you can make use of this Python API for managing NiFi<https://pypi.org/project/nipyapi/>.  Or you could go with a more fully developed automation approach provided by Kylo<http://kylo.io/>.

Good luck,
Paul Gibeault

From: Kuhfahl, Bob [mailto:rkuhfahl@mitre.org]
Sent: Friday, August 17, 2018 7:55 AM
To: users@nifi.apache.org
Subject: [EXT] Design pattern advice needed

Problem:

  *   Source database with over 200 tables.
  *   Current Nifi ‘system’ we are developing can extract data from those 200 tables into NiFi flows of JSON-formatted data, essentially separate flows for each table with an attribute that indicates the tablename and other useful attributes but NOT the  schema.
  *   Do some data transforms, and prepare it for target database load.  This is where I am struggling.
  *   Large volume of data so we need to batch load using PutDatabaseRecord.
  *   PutDatabaseRecord record readers such as JsonPathReader need attributes defined for each element in the data – I’d need to define over 200 instances of PutDatabaseRecord and route based on the tablename.  Not.
  *   AvroReader seems almost a natural fit, I can InferAvroSchema from the Json, but I’m not finding an easy way to convert the Json to Avro…
  *   CSVReader seems like the only other choice but the manual conversion of formats might also be a pain…

Thoughts on solutions?