You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com> on 2015/10/29 01:02:47 UTC

NiFi for CEP

I've been studying NiFi and there is something I'm not quite understanding. 
Can NiFi be used in place of Storm or Spark Streaming to process streaming 
data?

B. 


Re: NiFi for CEP

Posted by Joe Witt <jo...@gmail.com>.
I suspect Charlie's writeup is accurate from a traditional relational
DB ETL perspective.  I think you'll see CDC mechanisms increasingly
available through NiFi and you'll see us increasingly add features
around these use cases but oriented again from the perspective of
getting the data to systems such as those in the Hadoop ecosystem.  I
think the idea of NiFi supporting wide and shallow capabilities for
these DB use cases is probably about right even as we go forward.
That said if there are cases we can support well let's discuss them.


Thanks
Joe

On Thu, Oct 29, 2015 at 2:49 AM, Charlie Frasure
<ch...@gmail.com> wrote:
> Bob,
>
> I can relate to your background, and thought process for NiFi.  My limited
> experience with it has highlighted the "simple event processing" portion of
> the tool.  It can replace some of the things that SSIS, Informatica, or
> AbInitio are used for, but has a much wider and more shallow focus.  The ETL
> tools are focused on transformations like joins, aggregates, columns to
> rows, or rows to columns, change data capture etc.  NiFi could do some of
> these things, but seems to be designed more for iterative processing of file
> objects.  It's a lightweight data processing tool, or a lightweight service
> bus, depending on your perspective.
>
> You could probably use the sql processor to insert records, assign new DW
> keys etc, and if you already have your logic in stored procedures, you might
> even be able to use NiFi as a process manager, passing the return values as
> needed until you eventually use the email processor to send a notification
> that the warehouse load is complete.  But I don't think that you want to try
> to manage  the tasks associated with relational and dimensional models from
> within NiFi itself.
>
> Charlie
>
>
> On Wed, Oct 28, 2015 at 10:29 PM, Adaryl "Bob" Wakefield, MBA
> <ad...@hotmail.com> wrote:
>>
>> My work is mostly old world ETL. I create data pipelines using SSIS where
>> I move data usually from flat files into a data warehouse. Since I
>> discovered the Hadoop ecosystem, I’ve been looking for ways to speed up data
>> warehouse load even to a point where I’m loading data in real time. I’ve
>> seen Storm and Spark used as a way to do that. However, I’m not a java
>> developer (yet) and Storm has a pretty steep learning curve for me. When
>> NiFi got announced, I looked at it and said, “hey this looks like SSIS for
>> big data”. So I’ve been kind of looking at it through the lens of visual
>> data flow development.
>>
>> While I’ve seen stacks where Storm is used to load data warehouses but
>> warehouse loads aren’t that complex; using java to cleanse data kind of
>> seams like over kill. 95% of my work can get done using SQL and SSIS is just
>> the traffic cop that tells which stored procs to execute.
>>
>> I guess a better question to ask is, can NiFi be used in place of SSIS
>> (Data Stage, Informatica, etc)? So instead of:
>> SSIS –> Warehouse (batch processing paradigm)
>> you have
>> Kafka –> Nifi –> Warehouse (real time processing)
>>
>> Am I even thinking about this correctly? I know we’re talking about moving
>> data between systems but frequently the move I deal with is on the same box
>> and there is some other piece of software that drops the files to be
>> processed into a folder.
>>
>> B.
>>
>> From: Joe Percivall
>> Sent: Wednesday, October 28, 2015 7:43 PM
>> To: users@nifi.apache.org
>> Subject: Re: NiFi for CEP
>>
>> Hey Bob,
>>
>> It really depends on your definition of CEP (complex event processing). If
>> what you're trying to do is advanced processing on a single piece of data
>> (anything from small sensor data to huge medical data)  NiFi could
>> potentially be a great candidate to replace Storm or Spark Streaming. If
>> you're trying to do advanced analytics on many pieces of data or create a
>> stateful response to data then Storm or Spark Streaming would handle that
>> better (but NiFi would do a great job getting the data from the edge to the
>> other tech!).
>>
>> There are others that can go into more depth but that's it in a nutshell,
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joepercivall@yahoo.com
>>
>>
>>
>>
>> On Wednesday, October 28, 2015 8:02 PM, "Adaryl "Bob" Wakefield, MBA"
>> <ad...@hotmail.com> wrote:
>>
>>
>> I've been studying NiFi and there is something I'm not quite
>> understanding.
>> Can NiFi be used in place of Storm or Spark Streaming to process streaming
>> data?
>>
>>
>> B.
>>
>>
>>
>

Re: NiFi for CEP

Posted by Charlie Frasure <ch...@gmail.com>.
Bob,

I can relate to your background, and thought process for NiFi.  My limited
experience with it has highlighted the "simple event processing" portion of
the tool.  It can replace some of the things that SSIS, Informatica, or
AbInitio are used for, but has a much wider and more shallow focus.  The
ETL tools are focused on transformations like joins, aggregates, columns to
rows, or rows to columns, change data capture etc.  NiFi could do some of
these things, but seems to be designed more for iterative processing of
file objects.  It's a lightweight data processing tool, or a lightweight
service bus, depending on your perspective.

You could probably use the sql processor to insert records, assign new DW
keys etc, and if you already have your logic in stored procedures, you
might even be able to use NiFi as a process manager, passing the return
values as needed until you eventually use the email processor to send a
notification that the warehouse load is complete.  But I don't think that
you want to try to manage  the tasks associated with relational and
dimensional models from within NiFi itself.

Charlie


On Wed, Oct 28, 2015 at 10:29 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefield@hotmail.com> wrote:

> My work is mostly old world ETL. I create data pipelines using SSIS where
> I move data usually from flat files into a data warehouse. Since I
> discovered the Hadoop ecosystem, I’ve been looking for ways to speed up
> data warehouse load even to a point where I’m loading data in real time.
> I’ve seen Storm and Spark used as a way to do that. However, I’m not a java
> developer (yet) and Storm has a pretty steep learning curve for me. When
> NiFi got announced, I looked at it and said, “hey this looks like SSIS for
> big data”. So I’ve been kind of looking at it through the lens of visual
> data flow development.
>
> While I’ve seen stacks where Storm is used to load data warehouses but
> warehouse loads aren’t that complex; using java to cleanse data kind of
> seams like over kill. 95% of my work can get done using SQL and SSIS is
> just the traffic cop that tells which stored procs to execute.
>
> I guess a better question to ask is, can NiFi be used in place of SSIS
> (Data Stage, Informatica, etc)? So instead of:
> SSIS –> Warehouse (batch processing paradigm)
> you have
> Kafka –> Nifi –> Warehouse (real time processing)
>
> Am I even thinking about this correctly? I know we’re talking about moving
> data between systems but frequently the move I deal with is on the same box
> and there is some other piece of software that drops the files to be
> processed into a folder.
>
> B.
>
> *From:* Joe Percivall <jo...@yahoo.com>
> *Sent:* Wednesday, October 28, 2015 7:43 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi for CEP
>
> Hey Bob,
>
> It really depends on your definition of CEP (complex event processing). If
> what you're trying to do is advanced processing on a single piece of data
> (anything from small sensor data to huge medical data)  NiFi could
> potentially be a great candidate to replace Storm or Spark Streaming. If
> you're trying to do advanced analytics on many pieces of data or create a
> stateful response to data then Storm or Spark Streaming would handle that
> better (but NiFi would do a great job getting the data from the edge to the
> other tech!).
>
> There are others that can go into more depth but that's it in a nutshell,
> Joe
> - - - - - -
> *Joseph Percivall*
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
>
> On Wednesday, October 28, 2015 8:02 PM, "Adaryl "Bob" Wakefield, MBA" <
> adaryl.wakefield@hotmail.com> wrote:
>
>
> I've been studying NiFi and there is something I'm not quite
> understanding.
> Can NiFi be used in place of Storm or Spark Streaming to process streaming
> data?
>
>
> B.
>
>
>
>

Re: NiFi for CEP

Posted by "Adaryl \"Bob\" Wakefield, MBA" <ad...@hotmail.com>.
My work is mostly old world ETL. I create data pipelines using SSIS where I move data usually from flat files into a data warehouse. Since I discovered the Hadoop ecosystem, I’ve been looking for ways to speed up data warehouse load even to a point where I’m loading data in real time. I’ve seen Storm and Spark used as a way to do that. However, I’m not a java developer (yet) and Storm has a pretty steep learning curve for me. When NiFi got announced, I looked at it and said, “hey this looks like SSIS for big data”. So I’ve been kind of looking at it through the lens of visual data flow development.

While I’ve seen stacks where Storm is used to load data warehouses but warehouse loads aren’t that complex; using java to cleanse data kind of seams like over kill. 95% of my work can get done using SQL and SSIS is just the traffic cop that tells which stored procs to execute. 

I guess a better question to ask is, can NiFi be used in place of SSIS (Data Stage, Informatica, etc)? So instead of:
SSIS –> Warehouse (batch processing paradigm)
you have
Kafka –> Nifi –> Warehouse (real time processing)

Am I even thinking about this correctly? I know we’re talking about moving data between systems but frequently the move I deal with is on the same box and there is some other piece of software that drops the files to be processed into a folder.

B.

From: Joe Percivall 
Sent: Wednesday, October 28, 2015 7:43 PM
To: users@nifi.apache.org 
Subject: Re: NiFi for CEP

Hey Bob,


It really depends on your definition of CEP (complex event processing). If what you're trying to do is advanced processing on a single piece of data (anything from small sensor data to huge medical data)  NiFi could potentially be a great candidate to replace Storm or Spark Streaming. If you're trying to do advanced analytics on many pieces of data or create a stateful response to data then Storm or Spark Streaming would handle that better (but NiFi would do a great job getting the data from the edge to the other tech!).

There are others that can go into more depth but that's it in a nutshell,
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com





On Wednesday, October 28, 2015 8:02 PM, "Adaryl "Bob" Wakefield, MBA" <ad...@hotmail.com> wrote:




I've been studying NiFi and there is something I'm not quite understanding. 
Can NiFi be used in place of Storm or Spark Streaming to process streaming 
data? 


B. 





Re: NiFi for CEP

Posted by Joe Percivall <jo...@yahoo.com>.
Hey Bob,
It really depends on your definition of CEP (complex event processing). If what you're trying to do is advanced processing on a single piece of data (anything from small sensor data to huge medical data)  NiFi could potentially be a great candidate to replace Storm or Spark Streaming. If you're trying to do advanced analytics on many pieces of data or create a stateful response to data then Storm or Spark Streaming would handle that better (but NiFi would do a great job getting the data from the edge to the other tech!). There are others that can go into more depth but that's it in a nutshell,Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com
 


     On Wednesday, October 28, 2015 8:02 PM, "Adaryl "Bob" Wakefield, MBA" <ad...@hotmail.com> wrote:
   

 I've been studying NiFi and there is something I'm not quite understanding. 
Can NiFi be used in place of Storm or Spark Streaming to process streaming 
data?

B. 



  

Re: NiFi for CEP

Posted by Joe Witt <jo...@gmail.com>.
Hello

I do not recommend viewing NiFi as something to use 'instead of' Storm
or Spark.  NiFi's design and the goals behind it are focused on
managing the flow of information from systems which create data, to
systems that process data, to systems that store data.  Invariably in
the process of connecting the many systems that exist throughout the
enterprise there are mismatches of protocol, format, schema, priority,
interest, authority, etc.. In solving these mismatches NiFi is often
used to do 'data processing'.  Examples of data processing that NiFi
aims to be very compelling for include things like transforming from
one format to another, filtering, sanitization, enrichment,
aggregation, splitting, etc..

In this light there are certainly processing functions done in systems
like Spark or Storm that NiFi would be perfectly fine and perhaps even
better suited for.  But, it is important to keep in mind the focus and
intent of the systems.  In my view, NiFi is oriented around dataflow -
connecting systems within and between datacenters and so on.  Storm
and Spark are focused on processing/analysis of that data.  So these
systems make different design trade-offs and provide different user
experiences centered around their intent.  As a result I think the
more compelling story is about how to leverage the strengths of these
sorts of systems together.

Always tough to answer this question in generic terms.  Happy to talk
through specific use cases.

Thanks
Joe

On Wed, Oct 28, 2015 at 8:02 PM, Adaryl "Bob" Wakefield, MBA
<ad...@hotmail.com> wrote:
> I've been studying NiFi and there is something I'm not quite understanding.
> Can NiFi be used in place of Storm or Spark Streaming to process streaming
> data?
>
> B.