You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Adunuthula, Seshu" <sa...@ebay.com> on 2015/05/26 22:48:16 UTC

Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]

Hello Folks,

Finally got to install NiFi and got the sample flows running and read the Blog article at https://blogs.apache.org/nifi/entry/basic_dataflow_design.

> The question was "Is it possible to have NiFi service setup and running and allow for multiple dataflows to be designed and deployed (running) at the same time?”

I understand the argument being made by the author on how you can use Nifi to have a single flow with several inputs compared to several disparate flows. But there are multiple advantages to having Nifi manage several disparate flows.

  *   Managing Flows that have very different transformations
  *   Security: Authorization, who has access to what flows, executing flows as a named user instead of a super user.
  *   Resource Management: Scheduling the resources across disparate flows
  *   Etc

Are there future plans to have Nifi Service setup and manage multiple data flows?

Regards
Seshu Adunuthula




Re: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.
Joe, 

Thanks for the detailed response. Let me spend some time understanding the
model of ProcessGroups and templates. I guess it is a switch from the
classic model, so would take time to get used to…


Regards
Seshu

On 5/28/15, 6:45 AM, "Joe Witt" <jo...@gmail.com> wrote:

>Seshu,
>
>NiFi has been used extensively as an enterprise (global) wide dataflow
>tool.  It supports large teams of people with differing levels of
>authorization and access roles operating on the same cluster
>supporting vast numbers of different dataflows through the same
>system.  Though it has some considerable utility in a classic ETL
>sense it wasn't built for classic ETL cases necessarily.  It was built
>for the sensor/source to processing to database/warehouse/etc..
>problem on a really massive scale.  In many ways it replaces
>traditional ETL approaches and in others it compliments them.  We
>weren't really setting out to replace some particular system and
>specifically weren't inspired by the systems mentioned.  But rather we
>set out to fill a gap that we saw.  Specifically that is effective
>'dataflow' is not just 'data transport'.
>
>Regarding the open flow/save flow approach we definitely considered
>that.  We often refer to that as the 'design and deploy model'.  In
>many ways that is why we built nifi in the first place.  There is
>definitely value in that model.  But there is also a large dragging
>force it imposes which is it creates a significant disconnect from
>making a change and seeing its effect.  That often means slow
>integration activities and when errors occur it isn't easy to find
>root cause.  That model provides a sense of comfort as it is common
>and well known and fits a typical software development model.  But it
>doesn't necessarily reflect the operational needs that can occur which
>require prompt, reliable, verifiable changes to benefit the business.
>
>So the model NiFi supports is that of immediate/real-time changes.  We
>can then create templates of those flows, store them in a registry,
>and folks could share them.  There are additional things we can do to
>support the classic design and deploy model for the cases where it is
>truly essential.  And we're also working with folks to explain the
>value in moving away from that model when they can.  There is no
>single answer for sure but we needed a model that can support both
>sides of that story and that is what we have.  We've started from this
>base of realtime command and control and are adding support for the
>classic model.  But the classic model alone cannot support realtime.
>
>Let's keep the discussion going.  This is good stuff.  We know we can
>and should do more to support the classic view when critical but we
>want to really understand the 'why' behind it.  In some cases folks
>like it because they know that and in others it is truly critical.  We
>want to understand those truly critical cases.
>
>Thanks
>Joe
>
>
>
>On Thu, May 28, 2015 at 8:58 AM, Adunuthula, Seshu <sa...@ebay.com>
>wrote:
>> Mark,
>>
>> Thanks for the response. Is Process Groups the only abstraction for
>> maintaining disparate flows, Did you consider the more traditional Open
>> Flow/Save Flow approach?
>>
>> If I  start thinking of NiFi as a replacement to enterprise ETL tools
>>like
>> Informatica/AbInito in the Hadoop world, I would introduce different
>> personas ³Administrator": Manages and monitors the flows, ³ETL
>>Developer²:
>> develops and deploys the flows etc and build an authorization model
>>around
>> it.
>>
>> It would definitely complicate the model, but would allow for an
>> enterprise wide deployment of NiFi. Would love to discuss more.
>>
>> Regards
>> Seshu Adunuthula
>>
>>
>> On 5/27/15, 2:18 PM, "Mark Payne" <ma...@hotmail.com> wrote:
>>
>>>Seshu,
>>>
>>>Thanks for the e-mail and for sharing your concerns!
>>>
>>>So when we talk about combining multiple sources into a single flow, we
>>>don't mean that all data should be combined into a single flow. It
>>>absolutely makes sense to sometimes have very disparate flows! In some
>>>of
>>>the instances we've run, we have dozens or more disparate flows. The
>>>idea
>>>that I wanted to convey in the article is that just because 2 pieces of
>>>data come from different sources does not mean that they should be
>>>different flows. But if the data needs to be handled very differently
>>>then it absolutely should be two different flows. Those flows then can
>>>live side-by-side within the same instance of NiFi (generally in
>>>different Process Groups so that the graph is maintainable).
>>>
>>>The idea of how to handle security and authorization is definitely an
>>>ongoing debate. There are really two major approaches here. The first
>>>approach, which we offer today, is to have a separate instance of NiFi
>>>when different security and authorization is required. Remote Process
>>>Groups/site-to-site functionality is then used to send the data between
>>>flows. The rub here is that if you have many instances it can be
>>>different to manage them.
>>>
>>>The other approach would be to allow the security and authorization to
>>>take place at the Process Group level, rather than the Flow Controller
>>>level. This would be a very significant amount of work and may make the
>>>application more difficult to use, if the administrators then had to
>>>manage each group independently. So there are definitely trade-offs to
>>>each approach. If you have ideas about how you'd like to see it work,
>>>please share them so that we can make NiFi as useful as possible.
>>>
>>>Thanks
>>>-Mark
>>>
>>>----------------------------------------
>>>> From: sadunuthula@ebay.com
>>>> To: dev@nifi.incubator.apache.org
>>>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking
>>>>Differently About DataFlow]
>>>> Date: Tue, 26 May 2015 20:48:16 +0000
>>>>
>>>> Hello Folks,
>>>>
>>>> Finally got to install NiFi and got the sample flows running and read
>>>>the Blog article at
>>>>https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>>>>
>>>>> The question was "Is it possible to have NiFi service setup and
>>>>>running and allow for multiple dataflows to be designed and deployed
>>>>>(running) at the same time?²
>>>>
>>>> I understand the argument being made by the author on how you can use
>>>>Nifi to have a single flow with several inputs compared to several
>>>>disparate flows. But there are multiple advantages to having Nifi
>>>>manage
>>>>several disparate flows.
>>>>
>>>> * Managing Flows that have very different transformations
>>>> * Security: Authorization, who has access to what flows, executing
>>>>flows as a named user instead of a super user.
>>>> * Resource Management: Scheduling the resources across disparate flows
>>>> * Etc
>>>>
>>>> Are there future plans to have Nifi Service setup and manage multiple
>>>>data flows?
>>>>
>>>> Regards
>>>> Seshu Adunuthula
>>>>
>>>>
>>>>
>>>
>>


Re: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]

Posted by Joe Witt <jo...@gmail.com>.
Seshu,

NiFi has been used extensively as an enterprise (global) wide dataflow
tool.  It supports large teams of people with differing levels of
authorization and access roles operating on the same cluster
supporting vast numbers of different dataflows through the same
system.  Though it has some considerable utility in a classic ETL
sense it wasn't built for classic ETL cases necessarily.  It was built
for the sensor/source to processing to database/warehouse/etc..
problem on a really massive scale.  In many ways it replaces
traditional ETL approaches and in others it compliments them.  We
weren't really setting out to replace some particular system and
specifically weren't inspired by the systems mentioned.  But rather we
set out to fill a gap that we saw.  Specifically that is effective
'dataflow' is not just 'data transport'.

Regarding the open flow/save flow approach we definitely considered
that.  We often refer to that as the 'design and deploy model'.  In
many ways that is why we built nifi in the first place.  There is
definitely value in that model.  But there is also a large dragging
force it imposes which is it creates a significant disconnect from
making a change and seeing its effect.  That often means slow
integration activities and when errors occur it isn't easy to find
root cause.  That model provides a sense of comfort as it is common
and well known and fits a typical software development model.  But it
doesn't necessarily reflect the operational needs that can occur which
require prompt, reliable, verifiable changes to benefit the business.

So the model NiFi supports is that of immediate/real-time changes.  We
can then create templates of those flows, store them in a registry,
and folks could share them.  There are additional things we can do to
support the classic design and deploy model for the cases where it is
truly essential.  And we're also working with folks to explain the
value in moving away from that model when they can.  There is no
single answer for sure but we needed a model that can support both
sides of that story and that is what we have.  We've started from this
base of realtime command and control and are adding support for the
classic model.  But the classic model alone cannot support realtime.

Let's keep the discussion going.  This is good stuff.  We know we can
and should do more to support the classic view when critical but we
want to really understand the 'why' behind it.  In some cases folks
like it because they know that and in others it is truly critical.  We
want to understand those truly critical cases.

Thanks
Joe



On Thu, May 28, 2015 at 8:58 AM, Adunuthula, Seshu <sa...@ebay.com> wrote:
> Mark,
>
> Thanks for the response. Is Process Groups the only abstraction for
> maintaining disparate flows, Did you consider the more traditional Open
> Flow/Save Flow approach?
>
> If I  start thinking of NiFi as a replacement to enterprise ETL tools like
> Informatica/AbInito in the Hadoop world, I would introduce different
> personas ³Administrator": Manages and monitors the flows, ³ETL Developer²:
> develops and deploys the flows etc and build an authorization model around
> it.
>
> It would definitely complicate the model, but would allow for an
> enterprise wide deployment of NiFi. Would love to discuss more.
>
> Regards
> Seshu Adunuthula
>
>
> On 5/27/15, 2:18 PM, "Mark Payne" <ma...@hotmail.com> wrote:
>
>>Seshu,
>>
>>Thanks for the e-mail and for sharing your concerns!
>>
>>So when we talk about combining multiple sources into a single flow, we
>>don't mean that all data should be combined into a single flow. It
>>absolutely makes sense to sometimes have very disparate flows! In some of
>>the instances we've run, we have dozens or more disparate flows. The idea
>>that I wanted to convey in the article is that just because 2 pieces of
>>data come from different sources does not mean that they should be
>>different flows. But if the data needs to be handled very differently
>>then it absolutely should be two different flows. Those flows then can
>>live side-by-side within the same instance of NiFi (generally in
>>different Process Groups so that the graph is maintainable).
>>
>>The idea of how to handle security and authorization is definitely an
>>ongoing debate. There are really two major approaches here. The first
>>approach, which we offer today, is to have a separate instance of NiFi
>>when different security and authorization is required. Remote Process
>>Groups/site-to-site functionality is then used to send the data between
>>flows. The rub here is that if you have many instances it can be
>>different to manage them.
>>
>>The other approach would be to allow the security and authorization to
>>take place at the Process Group level, rather than the Flow Controller
>>level. This would be a very significant amount of work and may make the
>>application more difficult to use, if the administrators then had to
>>manage each group independently. So there are definitely trade-offs to
>>each approach. If you have ideas about how you'd like to see it work,
>>please share them so that we can make NiFi as useful as possible.
>>
>>Thanks
>>-Mark
>>
>>----------------------------------------
>>> From: sadunuthula@ebay.com
>>> To: dev@nifi.incubator.apache.org
>>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking
>>>Differently About DataFlow]
>>> Date: Tue, 26 May 2015 20:48:16 +0000
>>>
>>> Hello Folks,
>>>
>>> Finally got to install NiFi and got the sample flows running and read
>>>the Blog article at
>>>https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>>>
>>>> The question was "Is it possible to have NiFi service setup and
>>>>running and allow for multiple dataflows to be designed and deployed
>>>>(running) at the same time?²
>>>
>>> I understand the argument being made by the author on how you can use
>>>Nifi to have a single flow with several inputs compared to several
>>>disparate flows. But there are multiple advantages to having Nifi manage
>>>several disparate flows.
>>>
>>> * Managing Flows that have very different transformations
>>> * Security: Authorization, who has access to what flows, executing
>>>flows as a named user instead of a super user.
>>> * Resource Management: Scheduling the resources across disparate flows
>>> * Etc
>>>
>>> Are there future plans to have Nifi Service setup and manage multiple
>>>data flows?
>>>
>>> Regards
>>> Seshu Adunuthula
>>>
>>>
>>>
>>
>

RE: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]

Posted by Mark Payne <ma...@hotmail.com>.
Seshu,

The Open/Save/Deploy Flow approach is very much what is used pretty much everywhere other than NiFi.
This model is exactly the driving force that caused us to create NiFi to begin with - to avoid requiring
a developer to maintain these flows and deploy them.

Using the Open/Save/Deploy model, there is a very large disconnect between the person who is the domain expert
responsible for understand what the Enterprise needs in terms of dataflow, and the developer actually implementing it.
The typical use case is that the dataflow expert will determine that a change is needed. He/she will then formalize
the requirement in writing. This is then sent to an engineering manager of some kind who will determine which developer
is appropriate for the task and assign the task. The developer will then implement the task as he interprets the requirements.
When all testing is complete, it will be deployed. In the very best case, this cycle is long and drawn-out. In the worst case,
those requirements were not exact enough or false assumptions were made, and what is deployed is not what the dataflow
expert wanted.

Or perhaps what was deployed is exactly what the dataflow expert wanted, but there was a slight flaw in the logic. The
dataflow expert notices the problem and creates a new requirement, and the cycle starts over. These are very long iteration
cycles. As a result, this causes a long delay between realizing an idea and seeing it in production.

So NiFi was largely built to address this issue. We want the ability for the dataflow expert to make the change. The dataflow
expert should be somewhat technical, as they will need to understand data formats, etc. but not need be a developer by any means.
The majority of NiFi operators who create and maintain flows are not developers but rather other subject matter experts.

That being said, developers often are able to use NiFi to build some really interesting flows that an ops person wants to deploy.
For this reason, we built the concept of a Template. The developer (or another operator) can export parts of their flow (or an entire flow)
as a template file and then it can be imported into a different NiFi instance. We have talked about building a registry for such templates,
but that doesn't yet exist. It is a key component that we want to work on, though.

I don't believe that this in any way prevents enterprise wide deployments of NiFi, as we've used it has been deployed in extremely large
enterprise deployments as it was growing up with great success. However, there may well be (and probably are) use cases though 
that you have  that I've not considered. I would very much love to chat more about this with you (as well as any other ideas
or concerns that you may have with NiFi) going forward.

Thanks!
-Mark


----------------------------------------
> From: sadunuthula@ebay.com
> To: dev@nifi.incubator.apache.org
> Subject: Re: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]
> Date: Thu, 28 May 2015 12:58:18 +0000
>
> Mark,
>
> Thanks for the response. Is Process Groups the only abstraction for
> maintaining disparate flows, Did you consider the more traditional Open
> Flow/Save Flow approach?
>
> If I start thinking of NiFi as a replacement to enterprise ETL tools like
> Informatica/AbInito in the Hadoop world, I would introduce different
> personas ³Administrator": Manages and monitors the flows, ³ETL Developer²:
> develops and deploys the flows etc and build an authorization model around
> it.
>
> It would definitely complicate the model, but would allow for an
> enterprise wide deployment of NiFi. Would love to discuss more.
>
> Regards
> Seshu Adunuthula
>
>
> On 5/27/15, 2:18 PM, "Mark Payne" <ma...@hotmail.com> wrote:
>
>>Seshu,
>>
>>Thanks for the e-mail and for sharing your concerns!
>>
>>So when we talk about combining multiple sources into a single flow, we
>>don't mean that all data should be combined into a single flow. It
>>absolutely makes sense to sometimes have very disparate flows! In some of
>>the instances we've run, we have dozens or more disparate flows. The idea
>>that I wanted to convey in the article is that just because 2 pieces of
>>data come from different sources does not mean that they should be
>>different flows. But if the data needs to be handled very differently
>>then it absolutely should be two different flows. Those flows then can
>>live side-by-side within the same instance of NiFi (generally in
>>different Process Groups so that the graph is maintainable).
>>
>>The idea of how to handle security and authorization is definitely an
>>ongoing debate. There are really two major approaches here. The first
>>approach, which we offer today, is to have a separate instance of NiFi
>>when different security and authorization is required. Remote Process
>>Groups/site-to-site functionality is then used to send the data between
>>flows. The rub here is that if you have many instances it can be
>>different to manage them.
>>
>>The other approach would be to allow the security and authorization to
>>take place at the Process Group level, rather than the Flow Controller
>>level. This would be a very significant amount of work and may make the
>>application more difficult to use, if the administrators then had to
>>manage each group independently. So there are definitely trade-offs to
>>each approach. If you have ideas about how you'd like to see it work,
>>please share them so that we can make NiFi as useful as possible.
>>
>>Thanks
>>-Mark
>>
>>----------------------------------------
>>> From: sadunuthula@ebay.com
>>> To: dev@nifi.incubator.apache.org
>>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking
>>>Differently About DataFlow]
>>> Date: Tue, 26 May 2015 20:48:16 +0000
>>>
>>> Hello Folks,
>>>
>>> Finally got to install NiFi and got the sample flows running and read
>>>the Blog article at
>>>https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>>>
>>>> The question was "Is it possible to have NiFi service setup and
>>>>running and allow for multiple dataflows to be designed and deployed
>>>>(running) at the same time?²
>>>
>>> I understand the argument being made by the author on how you can use
>>>Nifi to have a single flow with several inputs compared to several
>>>disparate flows. But there are multiple advantages to having Nifi manage
>>>several disparate flows.
>>>
>>> * Managing Flows that have very different transformations
>>> * Security: Authorization, who has access to what flows, executing
>>>flows as a named user instead of a super user.
>>> * Resource Management: Scheduling the resources across disparate flows
>>> * Etc
>>>
>>> Are there future plans to have Nifi Service setup and manage multiple
>>>data flows?
>>>
>>> Regards
>>> Seshu Adunuthula
>>>
>>>
>>>
>>
>
 		 	   		  

Re: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.
Mark,

Thanks for the response. Is Process Groups the only abstraction for
maintaining disparate flows, Did you consider the more traditional Open
Flow/Save Flow approach?

If I  start thinking of NiFi as a replacement to enterprise ETL tools like
Informatica/AbInito in the Hadoop world, I would introduce different
personas ³Administrator": Manages and monitors the flows, ³ETL Developer²:
develops and deploys the flows etc and build an authorization model around
it. 

It would definitely complicate the model, but would allow for an
enterprise wide deployment of NiFi. Would love to discuss more.

Regards
Seshu Adunuthula


On 5/27/15, 2:18 PM, "Mark Payne" <ma...@hotmail.com> wrote:

>Seshu,
>
>Thanks for the e-mail and for sharing your concerns!
>
>So when we talk about combining multiple sources into a single flow, we
>don't mean that all data should be combined into a single flow. It
>absolutely makes sense to sometimes have very disparate flows! In some of
>the instances we've run, we have dozens or more disparate flows. The idea
>that I wanted to convey in the article is that just because 2 pieces of
>data come from different sources does not mean that they should be
>different flows. But if the data needs to be handled very differently
>then it absolutely should be two different flows. Those flows then can
>live side-by-side within the same instance of NiFi (generally in
>different Process Groups so that the graph is maintainable).
>
>The idea of how to handle security and authorization is definitely an
>ongoing debate. There are really two major approaches here. The first
>approach, which we offer today, is to have a separate instance of NiFi
>when different security and authorization is required. Remote Process
>Groups/site-to-site functionality is then used to send the data between
>flows. The rub here is that if you have many instances it can be
>different to manage them.
>
>The other approach would be to allow the security and authorization to
>take place at the Process Group level, rather than the Flow Controller
>level. This would be a very significant amount of work and may make the
>application more difficult to use, if the administrators then had to
>manage each group independently. So there are definitely trade-offs to
>each approach. If you have ideas about how you'd like to see it work,
>please share them so that we can make NiFi as useful as possible.
>
>Thanks
>-Mark
>
>----------------------------------------
>> From: sadunuthula@ebay.com
>> To: dev@nifi.incubator.apache.org
>> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking
>>Differently About DataFlow]
>> Date: Tue, 26 May 2015 20:48:16 +0000
>>
>> Hello Folks,
>>
>> Finally got to install NiFi and got the sample flows running and read
>>the Blog article at
>>https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>>
>>> The question was "Is it possible to have NiFi service setup and
>>>running and allow for multiple dataflows to be designed and deployed
>>>(running) at the same time?²
>>
>> I understand the argument being made by the author on how you can use
>>Nifi to have a single flow with several inputs compared to several
>>disparate flows. But there are multiple advantages to having Nifi manage
>>several disparate flows.
>>
>> * Managing Flows that have very different transformations
>> * Security: Authorization, who has access to what flows, executing
>>flows as a named user instead of a super user.
>> * Resource Management: Scheduling the resources across disparate flows
>> * Etc
>>
>> Are there future plans to have Nifi Service setup and manage multiple
>>data flows?
>>
>> Regards
>> Seshu Adunuthula
>>
>>
>>
> 		 	   		  


RE: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]

Posted by Mark Payne <ma...@hotmail.com>.
Seshu,

Thanks for the e-mail and for sharing your concerns!

So when we talk about combining multiple sources into a single flow, we don't mean that all data should be combined into a single flow. It absolutely makes sense to sometimes have very disparate flows! In some of the instances we've run, we have dozens or more disparate flows. The idea that I wanted to convey in the article is that just because 2 pieces of data come from different sources does not mean that they should be different flows. But if the data needs to be handled very differently then it absolutely should be two different flows. Those flows then can live side-by-side within the same instance of NiFi (generally in different Process Groups so that the graph is maintainable).

The idea of how to handle security and authorization is definitely an ongoing debate. There are really two major approaches here. The first approach, which we offer today, is to have a separate instance of NiFi when different security and authorization is required. Remote Process Groups/site-to-site functionality is then used to send the data between flows. The rub here is that if you have many instances it can be different to manage them.

The other approach would be to allow the security and authorization to take place at the Process Group level, rather than the Flow Controller level. This would be a very significant amount of work and may make the application more difficult to use, if the administrators then had to manage each group independently. So there are definitely trade-offs to each approach. If you have ideas about how you'd like to see it work, please share them so that we can make NiFi as useful as possible.

Thanks
-Mark

----------------------------------------
> From: sadunuthula@ebay.com
> To: dev@nifi.incubator.apache.org
> Subject: Thoughts on the Blog Article [Apache NiFi: Thinking Differently About DataFlow]
> Date: Tue, 26 May 2015 20:48:16 +0000
>
> Hello Folks,
>
> Finally got to install NiFi and got the sample flows running and read the Blog article at https://blogs.apache.org/nifi/entry/basic_dataflow_design.
>
>> The question was "Is it possible to have NiFi service setup and running and allow for multiple dataflows to be designed and deployed (running) at the same time?”
>
> I understand the argument being made by the author on how you can use Nifi to have a single flow with several inputs compared to several disparate flows. But there are multiple advantages to having Nifi manage several disparate flows.
>
> * Managing Flows that have very different transformations
> * Security: Authorization, who has access to what flows, executing flows as a named user instead of a super user.
> * Resource Management: Scheduling the resources across disparate flows
> * Etc
>
> Are there future plans to have Nifi Service setup and manage multiple data flows?
>
> Regards
> Seshu Adunuthula
>
>
>