You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by nehakaushik86 <ne...@gmail.com> on 2016/05/06 05:04:51 UTC

Apache Nifi Vs Spring XD, which one is better

Hi,

We are designing a system where we need data ingestion framework. The data
will be consumed from various data systems - DB, social feeds, text files,
CRM etc. Can you let me know how Apache Nifi fares as compared to Spring XD
and what are the best use cases where it should be used?


Also, I would like to understand the difference between Apache Nifi's
ExecuteSQL vs Apache Sqoop. We are planning to ingest huge amount of data
from DB - millions of records. Will ExecuteSQL be able to load such huge
volume?



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Apache-Nifi-Vs-Spring-XD-which-one-is-better-tp9963.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Apache Nifi Vs Spring XD, which one is better

Posted by Brandon DeVries <br...@jhu.edu>.
All,

It seems like we get this sort of question a lot, and Simon's answer here
was really good.  We've had similar for discussions for Kafka[1], Storm and
Spark[2]. Should we think about adding a comparison to other technologies /
applications to the FAQ?  Not in a sales sheet sort of way, but in a way
that emphasizes how these technologies compliment each other.  Obviously we
don't need to go out and find every comparable technology, but having a
place to put answers like Simon's that are easier to reference than the
Apache mail archive might be beneficial.

Brandon

[1] https://groups.google.com/forum/#!topic/confluent-platform/JKeccNEhwaQ
[2]
http://www.zdnet.com/article/hortonworks-cto-on-apache-nifi-what-is-it-and-why-does-it-matter-to-iot/


On Fri, May 6, 2016 at 6:09 AM Simon Ball <sb...@hortonworks.com> wrote:

> ExecuteSQL can certainly deal with millions of rows. Sqoop currently makes
> more sense if you want to distribute the query processing across a large
> number of nodes (if you have 100s millions of rows 10-100GBs+ or TBs of
> data), and write direct into hadoop. If you’re looking for functionality
> like swoop’s incremental imports, then checkout QueryDatabaseTable. As long
> as you set a sensible fetch size on that (1000ish usually good, but depends
> on row size) then I’ve seen very small NiFi instances (AWS t2.small) cope
> with a few millions of rows in the order of 10 seconds.
>
> SpringXD is really a different beast to NiFi. It’s a code->deploy pattern
> rather than a command and control of data flow pattern. Once you deploy a
> SpringXD flow, it’s fixed (more like spark, storm etc compile, deploy,
> never change.) SpringXD recently added some visual design, but Flo is
> primarily a retrospective development environment (monitor a flow, not
> design it).
>
> Nifi also runs out to the edge, and gets the data. SpringXD runs in a core
> cluster (e.g. on YARN). So in this scenario, SpringXD is more like Beam or
> spark steaming. Nifi however, with site-to-site can be used to run right
> out at the edge, secure and transport data and track from origin. This
> means NiFi is actually a complement to technology like SpringXD and Beam.
> NiFi feeds these heavier weight streaming frameworks, handles the data
> movement and simple event processing, then ingesting for more complex
> analytics with the like of XD.
>
> So in short, the technologies are complementary. NiFi has the edge of
> reaching out to collect data, XD may be better for complex analytics.
>
> Simon
>
>
> > On May 6, 2016, at 6:04 AM, nehakaushik86 <ne...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > We are designing a system where we need data ingestion framework. The
> data
> > will be consumed from various data systems - DB, social feeds, text
> files,
> > CRM etc. Can you let me know how Apache Nifi fares as compared to Spring
> XD
> > and what are the best use cases where it should be used?
> >
> >
> > Also, I would like to understand the difference between Apache Nifi's
> > ExecuteSQL vs Apache Sqoop. We are planning to ingest huge amount of data
> > from DB - millions of records. Will ExecuteSQL be able to load such huge
> > volume?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Apache-Nifi-Vs-Spring-XD-which-one-is-better-tp9963.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
> >
>
>

Re: Apache Nifi Vs Spring XD, which one is better

Posted by Simon Ball <sb...@hortonworks.com>.
ExecuteSQL can certainly deal with millions of rows. Sqoop currently makes more sense if you want to distribute the query processing across a large number of nodes (if you have 100s millions of rows 10-100GBs+ or TBs of data), and write direct into hadoop. If you’re looking for functionality like swoop’s incremental imports, then checkout QueryDatabaseTable. As long as you set a sensible fetch size on that (1000ish usually good, but depends on row size) then I’ve seen very small NiFi instances (AWS t2.small) cope with a few millions of rows in the order of 10 seconds. 

SpringXD is really a different beast to NiFi. It’s a code->deploy pattern rather than a command and control of data flow pattern. Once you deploy a SpringXD flow, it’s fixed (more like spark, storm etc compile, deploy, never change.) SpringXD recently added some visual design, but Flo is primarily a retrospective development environment (monitor a flow, not design it). 

Nifi also runs out to the edge, and gets the data. SpringXD runs in a core cluster (e.g. on YARN). So in this scenario, SpringXD is more like Beam or spark steaming. Nifi however, with site-to-site can be used to run right out at the edge, secure and transport data and track from origin. This means NiFi is actually a complement to technology like SpringXD and Beam. NiFi feeds these heavier weight streaming frameworks, handles the data movement and simple event processing, then ingesting for more complex analytics with the like of XD. 

So in short, the technologies are complementary. NiFi has the edge of reaching out to collect data, XD may be better for complex analytics. 

Simon


> On May 6, 2016, at 6:04 AM, nehakaushik86 <ne...@gmail.com> wrote:
> 
> Hi,
> 
> We are designing a system where we need data ingestion framework. The data
> will be consumed from various data systems - DB, social feeds, text files,
> CRM etc. Can you let me know how Apache Nifi fares as compared to Spring XD
> and what are the best use cases where it should be used?
> 
> 
> Also, I would like to understand the difference between Apache Nifi's
> ExecuteSQL vs Apache Sqoop. We are planning to ingest huge amount of data
> from DB - millions of records. Will ExecuteSQL be able to load such huge
> volume?
> 
> 
> 
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Apache-Nifi-Vs-Spring-XD-which-one-is-better-tp9963.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
>