You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jonathan Meran <jo...@sonos.com> on 2019/01/11 17:02:14 UTC

NiFI as Data Pipeline Orchestration Tool?

Hello,
I am looking into the possibility of using NiFi as a Data Pipeline Orchestration Tool. I’m evaluating NiFi along with some other tools such as Airflow and AWS Step Functions/Lambdas.

Has anyone used NiFi as an orchestration/scheduling tool for tasks such as submitting spark jobs to an EMR cluster? These are some of the requirements we are considering while evaluating such a tool:


  1.  SSH capabilities to execute remote commands
  2.  Rich scheduling (CRON)
  3.  Ability to write custom routines and import custom libraries
  4.  Event-based triggering of a pipeline

Any insight would be helpful. We have used NiFi for about a year now for data movement and are familiar with its capabilities. My biggest worry is the ability to coordinate with other machines using SSH.

Thanks,
Jon

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Sivaprasanna <si...@gmail.com>.
I would agree with Joe as NiFi is not primarily an orchestration tool so it
may not offer you a full fledged orchestration tool's experience. Having
said that, we have been using NiFi to launch Spark jobs in our HDP and
Azure HDInsight clusters. We are leveraging the Livy service available in
the clusters to do the job. We are using InvokeHTTP processor submit the
job through Livy.

-
Sivaprasanna

On Sat, Jan 12, 2019 at 3:16 AM Otto Fowler <ot...@gmail.com> wrote:

> You may want to monitor https://issues.apache.org/jira/browse/NIFI-3698
>
>
>
> On January 11, 2019 at 14:22:24, Jonathan Meran (jonathan.meran@sonos.com)
> wrote:
>
> Thanks Joe!
>
>
>
> We appreciate the kind words and am happy you enjoy our products!
>
>
>
> My thinking is aligned with yours for sure. A main driver for the
> consideration of NiFi for orchestration is that it’s a system we already
> have up and running and maintain.
>
>
>
> Thanks again,
>
> Jon
>
>
>
> *From: *Joe Witt <jo...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, January 11, 2019 at 12:28 PM
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: NiFI as Data Pipeline Orchestration Tool?
>
>
>
> Jon
>
>
>
> First things first - Sonos is awesome.
>
>
>
> Now back to the matter at hand... NiFi is quite often used for various
> forms of orchestration of other systems doing their thing.  However, I'll
> state that isn't really its primary purpose so for pure orchestration cases
> it can leave you with a less than ideal user experience.
>
>
>
> NiFi is more about managing the flow of data to and from systems and doing
> the necessary
> routing/splitting/forking/joining/merging/transforming/cajoling to make
> that work well.  We're less about telling those other systems what to do
> with the data or when to run.
>
>
>
> Now, having said this it is pretty common.  We have the Spark Livy
> integration for example.  I'd recommend you give tools that cater primarily
> to orchestration a first stab on this and if you find the problem looks
> more and more like I describe then NiFi is probably appropriate.
>
>
>
> Hope that helps a bit.  Talking at a terminology basis is tough as things
> like ETL, orchestration, transformation often mean wildly different things
> to different people.
>
>
>
> Thanks
>
>
>
> On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>
> wrote:
>
> Hello,
>
> I am looking into the possibility of using NiFi as a Data Pipeline
> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
> Airflow and AWS Step Functions/Lambdas.
>
>
>
> Has anyone used NiFi as an orchestration/scheduling tool for tasks such as
> submitting spark jobs to an EMR cluster? These are some of the requirements
> we are considering while evaluating such a tool:
>
>
>
>    1. SSH capabilities to execute remote commands
>    2. Rich scheduling (CRON)
>    3. Ability to write custom routines and import custom libraries
>    4. Event-based triggering of a pipeline
>
>
>
> Any insight would be helpful. We have used NiFi for about a year now for
> data movement and are familiar with its capabilities. My biggest worry is
> the ability to coordinate with other machines using SSH.
>
>
>
> Thanks,
>
> Jon
>
>

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Otto Fowler <ot...@gmail.com>.
You may want to monitor https://issues.apache.org/jira/browse/NIFI-3698



On January 11, 2019 at 14:22:24, Jonathan Meran (jonathan.meran@sonos.com)
wrote:

Thanks Joe!



We appreciate the kind words and am happy you enjoy our products!



My thinking is aligned with yours for sure. A main driver for the
consideration of NiFi for orchestration is that it’s a system we already
have up and running and maintain.



Thanks again,

Jon



*From: *Joe Witt <jo...@gmail.com>
*Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
*Date: *Friday, January 11, 2019 at 12:28 PM
*To: *"users@nifi.apache.org" <us...@nifi.apache.org>
*Subject: *Re: NiFI as Data Pipeline Orchestration Tool?



Jon



First things first - Sonos is awesome.



Now back to the matter at hand... NiFi is quite often used for various
forms of orchestration of other systems doing their thing.  However, I'll
state that isn't really its primary purpose so for pure orchestration cases
it can leave you with a less than ideal user experience.



NiFi is more about managing the flow of data to and from systems and doing
the necessary
routing/splitting/forking/joining/merging/transforming/cajoling to make
that work well.  We're less about telling those other systems what to do
with the data or when to run.



Now, having said this it is pretty common.  We have the Spark Livy
integration for example.  I'd recommend you give tools that cater primarily
to orchestration a first stab on this and if you find the problem looks
more and more like I describe then NiFi is probably appropriate.



Hope that helps a bit.  Talking at a terminology basis is tough as things
like ETL, orchestration, transformation often mean wildly different things
to different people.



Thanks



On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>
wrote:

Hello,

I am looking into the possibility of using NiFi as a Data Pipeline
Orchestration Tool. I’m evaluating NiFi along with some other tools such as
Airflow and AWS Step Functions/Lambdas.



Has anyone used NiFi as an orchestration/scheduling tool for tasks such as
submitting spark jobs to an EMR cluster? These are some of the requirements
we are considering while evaluating such a tool:



   1. SSH capabilities to execute remote commands
   2. Rich scheduling (CRON)
   3. Ability to write custom routines and import custom libraries
   4. Event-based triggering of a pipeline



Any insight would be helpful. We have used NiFi for about a year now for
data movement and are familiar with its capabilities. My biggest worry is
the ability to coordinate with other machines using SSH.



Thanks,

Jon

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Jonathan Meran <jo...@sonos.com>.
Thanks Joe!

We appreciate the kind words and am happy you enjoy our products!

My thinking is aligned with yours for sure. A main driver for the consideration of NiFi for orchestration is that it’s a system we already have up and running and maintain.

Thanks again,
Jon

From: Joe Witt <jo...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Friday, January 11, 2019 at 12:28 PM
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: NiFI as Data Pipeline Orchestration Tool?

Jon First things first - Sonos is awesome. Now back to the matter at hand... NiFi is quite often used for various forms of orchestration of other systems doing their thing. However, I'll state that is
External (joe.witt@gmail.com<ma...@gmail.com>)

Report This Email<https://shared.outlook.inky.com/report?id=c29ub3Mvam9uYXRoYW4ubWVyYW5Ac29ub3MuY29tLzQ1NDA0ZjRiMTg3ZDY5MjgyMGEzYTMwNjg2Yjc0ZmM4LzE1NDcyMjc3MzUuODI=#key=61295ec611ab18d137e63b336f16cccc>  FAQ<https://inky.com/banner-faq/>  Protection by Inky<https://inky.com>

Jon

First things first - Sonos is awesome.

Now back to the matter at hand... NiFi is quite often used for various forms of orchestration of other systems doing their thing.  However, I'll state that isn't really its primary purpose so for pure orchestration cases it can leave you with a less than ideal user experience.

NiFi is more about managing the flow of data to and from systems and doing the necessary routing/splitting/forking/joining/merging/transforming/cajoling to make that work well.  We're less about telling those other systems what to do with the data or when to run.

Now, having said this it is pretty common.  We have the Spark Livy integration for example.  I'd recommend you give tools that cater primarily to orchestration a first stab on this and if you find the problem looks more and more like I describe then NiFi is probably appropriate.

Hope that helps a bit.  Talking at a terminology basis is tough as things like ETL, orchestration, transformation often mean wildly different things to different people.

Thanks

On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>> wrote:
Hello,
I am looking into the possibility of using NiFi as a Data Pipeline Orchestration Tool. I’m evaluating NiFi along with some other tools such as Airflow and AWS Step Functions/Lambdas.

Has anyone used NiFi as an orchestration/scheduling tool for tasks such as submitting spark jobs to an EMR cluster? These are some of the requirements we are considering while evaluating such a tool:


  1.  SSH capabilities to execute remote commands
  2.  Rich scheduling (CRON)
  3.  Ability to write custom routines and import custom libraries
  4.  Event-based triggering of a pipeline

Any insight would be helpful. We have used NiFi for about a year now for data movement and are familiar with its capabilities. My biggest worry is the ability to coordinate with other machines using SSH.

Thanks,
Jon

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Joe Witt <jo...@gmail.com>.
Jon

First things first - Sonos is awesome.

Now back to the matter at hand... NiFi is quite often used for various
forms of orchestration of other systems doing their thing.  However, I'll
state that isn't really its primary purpose so for pure orchestration cases
it can leave you with a less than ideal user experience.

NiFi is more about managing the flow of data to and from systems and doing
the necessary
routing/splitting/forking/joining/merging/transforming/cajoling to make
that work well.  We're less about telling those other systems what to do
with the data or when to run.

Now, having said this it is pretty common.  We have the Spark Livy
integration for example.  I'd recommend you give tools that cater primarily
to orchestration a first stab on this and if you find the problem looks
more and more like I describe then NiFi is probably appropriate.

Hope that helps a bit.  Talking at a terminology basis is tough as things
like ETL, orchestration, transformation often mean wildly different things
to different people.

Thanks

On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>
wrote:

> Hello,
>
> I am looking into the possibility of using NiFi as a Data Pipeline
> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
> Airflow and AWS Step Functions/Lambdas.
>
>
>
> Has anyone used NiFi as an orchestration/scheduling tool for tasks such as
> submitting spark jobs to an EMR cluster? These are some of the requirements
> we are considering while evaluating such a tool:
>
>
>
>    1. SSH capabilities to execute remote commands
>    2. Rich scheduling (CRON)
>    3. Ability to write custom routines and import custom libraries
>    4. Event-based triggering of a pipeline
>
>
>
> Any insight would be helpful. We have used NiFi for about a year now for
> data movement and are familiar with its capabilities. My biggest worry is
> the ability to coordinate with other machines using SSH.
>
>
>
> Thanks,
>
> Jon
>

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Jerry Vinokurov <gr...@gmail.com>.
Hi all,

In our application, we faced the same problem. To solve it, we wrote a
Django app that sat at the center of the interaction between NiFi and
several other systems (including Spark and another internal application)
and used it to dispatch tasks as needed. In that architecture, NiFi was not
itself the orchestrator, but rather interacted with another application
that acted that way. We found that this was a good solution to our problems
that properly divided responsibilities between what NiFi was good at doing
(moving files from place to place) and what was better done in Python code
(many of the tasks described above). If you don't want to go so far as to
write your own orchestrator, you might want to checkout crossbar.io, which
could serve the function of communicating between different services.

Jerry

On Tue, Jan 22, 2019 at 11:05 AM Otto Fowler <ot...@gmail.com>
wrote:

> How would nifi look or have to look to support batch cases I wonder
>
>
> On January 22, 2019 at 10:24:10, Boris Tyukin (boris@boristyukin.com)
> wrote:
>
> We've looked at both...Airflow might be a way better tool for
> coordination/scheduling. Why do not you take one of your pipelines and try
> to implement it in both tools?
>
> We really liked Airflow but unfortunately, Airflow was not a good fit for
> real-time processes - that's why we decided to go with NiFi. But if you use
> it strictly for job coordination and typical ETL-like dependencies, you
> will have hard time. Things, which are easy and obvious with Airflow or ETL
> tools like Informatica or SSIS, are quite difficult with NiFi. Just check
> some examples on Wait/Notify or merge patterns and you will see why.
>
> IMHO since NiFi was designed from the ground up to support real-time use
> cases not batch cases, the design and approach are quite different from
> batch oriented tools like Airflow.
>
> Boris
>
> On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>
> wrote:
>
>> Hello,
>>
>> I am looking into the possibility of using NiFi as a Data Pipeline
>> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
>> Airflow and AWS Step Functions/Lambdas.
>>
>>
>>
>> Has anyone used NiFi as an orchestration/scheduling tool for tasks such
>> as submitting spark jobs to an EMR cluster? These are some of the
>> requirements we are considering while evaluating such a tool:
>>
>>
>>
>>    1. SSH capabilities to execute remote commands
>>    2. Rich scheduling (CRON)
>>    3. Ability to write custom routines and import custom libraries
>>    4. Event-based triggering of a pipeline
>>
>>
>>
>> Any insight would be helpful. We have used NiFi for about a year now for
>> data movement and are familiar with its capabilities. My biggest worry is
>> the ability to coordinate with other machines using SSH.
>>
>>
>>
>> Thanks,
>>
>> Jon
>>
>

-- 
http://www.google.com/profiles/grapesmoker

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Otto Fowler <ot...@gmail.com>.
How would nifi look or have to look to support batch cases I wonder


On January 22, 2019 at 10:24:10, Boris Tyukin (boris@boristyukin.com) wrote:

We've looked at both...Airflow might be a way better tool for
coordination/scheduling. Why do not you take one of your pipelines and try
to implement it in both tools?

We really liked Airflow but unfortunately, Airflow was not a good fit for
real-time processes - that's why we decided to go with NiFi. But if you use
it strictly for job coordination and typical ETL-like dependencies, you
will have hard time. Things, which are easy and obvious with Airflow or ETL
tools like Informatica or SSIS, are quite difficult with NiFi. Just check
some examples on Wait/Notify or merge patterns and you will see why.

IMHO since NiFi was designed from the ground up to support real-time use
cases not batch cases, the design and approach are quite different from
batch oriented tools like Airflow.

Boris

On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>
wrote:

> Hello,
>
> I am looking into the possibility of using NiFi as a Data Pipeline
> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
> Airflow and AWS Step Functions/Lambdas.
>
>
>
> Has anyone used NiFi as an orchestration/scheduling tool for tasks such as
> submitting spark jobs to an EMR cluster? These are some of the requirements
> we are considering while evaluating such a tool:
>
>
>
>    1. SSH capabilities to execute remote commands
>    2. Rich scheduling (CRON)
>    3. Ability to write custom routines and import custom libraries
>    4. Event-based triggering of a pipeline
>
>
>
> Any insight would be helpful. We have used NiFi for about a year now for
> data movement and are familiar with its capabilities. My biggest worry is
> the ability to coordinate with other machines using SSH.
>
>
>
> Thanks,
>
> Jon
>

Re: NiFI as Data Pipeline Orchestration Tool?

Posted by Boris Tyukin <bo...@boristyukin.com>.
We've looked at both...Airflow might be a way better tool for
coordination/scheduling. Why do not you take one of your pipelines and try
to implement it in both tools?

We really liked Airflow but unfortunately, Airflow was not a good fit for
real-time processes - that's why we decided to go with NiFi. But if you use
it strictly for job coordination and typical ETL-like dependencies, you
will have hard time. Things, which are easy and obvious with Airflow or ETL
tools like Informatica or SSIS, are quite difficult with NiFi. Just check
some examples on Wait/Notify or merge patterns and you will see why.

IMHO since NiFi was designed from the ground up to support real-time use
cases not batch cases, the design and approach are quite different from
batch oriented tools like Airflow.

Boris

On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jo...@sonos.com>
wrote:

> Hello,
>
> I am looking into the possibility of using NiFi as a Data Pipeline
> Orchestration Tool. I’m evaluating NiFi along with some other tools such as
> Airflow and AWS Step Functions/Lambdas.
>
>
>
> Has anyone used NiFi as an orchestration/scheduling tool for tasks such as
> submitting spark jobs to an EMR cluster? These are some of the requirements
> we are considering while evaluating such a tool:
>
>
>
>    1. SSH capabilities to execute remote commands
>    2. Rich scheduling (CRON)
>    3. Ability to write custom routines and import custom libraries
>    4. Event-based triggering of a pipeline
>
>
>
> Any insight would be helpful. We have used NiFi for about a year now for
> data movement and are familiar with its capabilities. My biggest worry is
> the ability to coordinate with other machines using SSH.
>
>
>
> Thanks,
>
> Jon
>