You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Ameer Mawia <am...@gmail.com> on 2018/10/31 21:04:11 UTC

Fwd: NIFI Usage for Data Transformation

We have a use case where we take data from a source(text data in csv
format), do transformation and manipulation of textual record, and output
the data in another (csv)format. This is being done by a Java based custom
framework, written specifically for this *transformation* piece.

Recently as Apache NIFI is being adopted at enterprise level by the
organisation, we have been asked to try *Apache NIFI* and see if can use
that as a replacement to this custom tool?

*My question is*:

   - How much leverage does *Apache NIFI *provides on the flowfile *content
   *manipulation?

I understand *NIFI *is good for creating data flow pipeline, but is it good
for *extensive TEXT Transformation* as well?   So far I have not found
obvious way to achieve that.

Appreciate the feedback.

Thanks,

-- 
http://ca.linkedin.com/in/ameermawia
Toronto, ON



-- 
http://ca.linkedin.com/in/ameermawia
Toronto, ON

Re: NIFI Usage for Data Transformation

Posted by Mark Rachelski <ma...@acommerce.asia>.
Ameer,

Allow me to provide an opinion on this as a user, not as one of the awesome
guys in this group that has built this very cool tool. They are likely to
be very enthusiastic.

In my experience, NiFi is not going to easily replace a custom tool that
has done a number of complex transforms. For example, we still do also use
PentahoDI for some level of elaborate ETL flows.  NiFi does have some basic
capabilities for doing text manipulation including:

   - Jolt Transformer - focused on JSON transforms
   - Regex processing - Straight-up text search/replace
   - Some level of splitting and joining files
   - I am sure someone will point out that you can embed your own custom
   transformer by writing a  NiFi processor (both native or scripted) or
   invoking your own scripts from a NiFi processor. And all of this is
   possible. Although this does require the security of the NiFi server itself
   to be relaxed a bit so that the development team building these scripts to
   deploy them to the server running NiFi. (especially if the scripts are
   external to the flow)
   - Furthermore, NiFi can be used to trigger some other tools that are
   doing heavy transforms such as Spark jobs or other Hadoop-based transforms
   if you have to manipulate large data sets.

What you will likely find is if your transforms require some level of data
joins inside the ETL pipeline that are currently being done by your custom
scripts, NiFi will not be able to help with that without resorting to
driving that through a data engine of some type (be it SQL engine, Hadoop,
etc...)

But, admittedly by guessing, I suspect the thing that your Enterprise
groups are going for is the very natural support for non-functional
requirements that NiFi or any framework worth anything will provide. That
is things like:

   - Monitoring
   - Consistent reporting
   - Provenance of data transforms and traceability
   - Packing a large number of flows into a few machines

With this, they can possibly host NiFi as a service and your teams simply
contribute flows.

At my company, we do not build our data ingest world exclusively around
NiFi, but we do use it pretty widely to get disparate data sets into our
data platform and it helps a lot. It means that the developers can focus on
writing good ingest flows instead of continually ensuring that every custom
script feeds into the non-functional requirements of your own data
platform. But admittedly, any good framework will also help you to address
those requirements with little repeat.

We also still keep some custom ETL jobs that are just not worth the effort
of porting to NiFi. We don't see those jobs as being technical debt. They
work and are not causing us any issues. But they had to be independently
developed to meet those non-functional requirements.

The best thing I can suggest is that you try it and see. As a long time
imperative programmer, I did find it a bit difficult to get my head wrapped
around the dynamic nature of building flows. But once you invest the effort
to learn, it becomes a pretty cool tool in your toolbox. Eventually,
writing flows becomes faster than writing imperative scripts and the
testing cycle is significantly shorter.

Regards,
Mark.

On Thu, Nov 1, 2018 at 4:04 AM Ameer Mawia <am...@gmail.com> wrote:

>
> We have a use case where we take data from a source(text data in csv
> format), do transformation and manipulation of textual record, and output
> the data in another (csv)format. This is being done by a Java based custom
> framework, written specifically for this *transformation* piece.
>
> Recently as Apache NIFI is being adopted at enterprise level by the
> organisation, we have been asked to try *Apache NIFI* and see if can use
> that as a replacement to this custom tool?
>
> *My question is*:
>
>    - How much leverage does *Apache NIFI *provides on the flowfile *content
>    *manipulation?
>
> I understand *NIFI *is good for creating data flow pipeline, but is it
> good for *extensive TEXT Transformation* as well?   So far I have not
> found obvious way to achieve that.
>
> Appreciate the feedback.
>
> Thanks,
>
> --
> http://ca.linkedin.com/in/ameermawia
> Toronto, ON
>
>
>
> --
> http://ca.linkedin.com/in/ameermawia
> Toronto, ON
>
>