You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Gunjan Dave <gu...@gmail.com> on 2016/08/21 14:18:00 UTC

NiFi Real World Usage

Hello NiFi Team,
Firstly, let me congratulate you on the fantastic work that your team is
doing and also on the release of 1.0 (beta) version. Your team is doing
great and NiFi shows many possibilites and use cases, currently not
envisioned nor currently marketed for.

To be able to use NiFi to fullest capacities can I request your help in
addressing some of my queries?

1) Parallel Development
How would a team of 50 developers work parallely on designing and
implementing common data flow? How would version control work? How can we
ensure that work of one developer in not overridden by another? Can we
version control the flow.xml file outside of nifi?
From microsoft biztalk background, where orchestration data flow design xml
is version controlled in TFS, can we do the same for flow.xml in git?
Can developers merge the local flow.xml externally and deploy the merged
flow.xml.gz into UAT?

2) User Authorization
I understand from docs that user authorization needs to be managed in
authorization xml file? But is this manual process? Like I would have some
50 to 100 developers using it, i would be pain to manage the authorization
manually via xml file?
Is there some other way like ldap for authorization?

3) sensitive data like dbcp connection passwords in external file in
encrypted form, is it possible and how?

4) Functional testing the entire flow
I understand each processor has run processor mock, but is there a way i
can test the entire flow in automated manner before going live?
Can there be an option of reading flow xml file into a separate test
harness and execute specific portions of the flow end to end?
Can minifi used for this purpose? I do understand that minifi is created
for different purpose but can it also be used for some form of automated
testing.

5) how does the debug flow processor work? Could not find enough
documentation there?

7) Is there a detailed infrastructure recommendation and setup guide for
NiFi cluster? Like best practises and some sample setup patterns.

Thanking in advance.

Thanks

Re: NiFi Real World Usage

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

Attempting to answer some of the questions...

1) Some teams have had success putting flow.xml.gz in version control, but
it might be challenging with 50 developers. You can't merge a flow.xml.gz
into another, it can only be dropped into another instance as a full
replacement. A second option is to organize the dataflow using process
groups and have templates for each process group. There has been seem work
around automating the deployment of templates [1]. The overall development
lifecycle is definitely an area for some improvements.

2) NiFi supports certificates, LDAP, and Kerberos for authentication. For
Authorization (in 1.0.0) it supports an internal file-based authorizer, but
all of the management is done through the NiFi UI. The "file-based" part
just refers to the fact that behind the scenes it is storing the users,
groups, and policies in an XML file. In addition, the authorizer is
pluggable and anyone can implement their own for pulling policies from
somewhere else. There is also an authorizer for Apache Ranger provided.

3) Currently sensitive values in processors are encrypted in the
flow.xml.gz and there is also an effort to encrypt sensitive values in
nifi.properties [2]. As the concept of a variable registry continues to
develop I suspect the encrypted property capabilities will apply to that as
well.

4) You could have an integration instance of NiFi running and use the REST
API [3] to orchestrate everything... load a template in, start a processor,
verify the output went somewhere.

5) I'm personally not familiar with the DebugFlowProcessor but I believe
the idea was to be able to simulate different scenarios for testing. The
documentation is here [4].

6) The admin guide explains how to setup a cluster [5] and it will be
updated a bit for 1.0.0 release. It is very had to recommend the
hardware/infrastructure requirements because it is very specific to your
flow, but in general NiFi is not meant to be scaled to 100s of nodes.
Usually a 10 node cluster would be considered fairly large, and you can
often do a great bit of processing on a single node.

Hope that helps.

-Bryan

[1] https://github.com/aperepel/nifi-api-deploy
[2] https://issues.apache.org/jira/browse/NIFI-1831
[3] https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
[4]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.DebugFlow/index.html
[5]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering


On Sun, Aug 21, 2016 at 10:18 AM, Gunjan Dave <gu...@gmail.com>
wrote:

> Hello NiFi Team,
> Firstly, let me congratulate you on the fantastic work that your team is
> doing and also on the release of 1.0 (beta) version. Your team is doing
> great and NiFi shows many possibilites and use cases, currently not
> envisioned nor currently marketed for.
>
> To be able to use NiFi to fullest capacities can I request your help in
> addressing some of my queries?
>
> 1) Parallel Development
> How would a team of 50 developers work parallely on designing and
> implementing common data flow? How would version control work? How can we
> ensure that work of one developer in not overridden by another? Can we
> version control the flow.xml file outside of nifi?
> From microsoft biztalk background, where orchestration data flow design xml
> is version controlled in TFS, can we do the same for flow.xml in git?
> Can developers merge the local flow.xml externally and deploy the merged
> flow.xml.gz into UAT?
>
> 2) User Authorization
> I understand from docs that user authorization needs to be managed in
> authorization xml file? But is this manual process? Like I would have some
> 50 to 100 developers using it, i would be pain to manage the authorization
> manually via xml file?
> Is there some other way like ldap for authorization?
>
> 3) sensitive data like dbcp connection passwords in external file in
> encrypted form, is it possible and how?
>
> 4) Functional testing the entire flow
> I understand each processor has run processor mock, but is there a way i
> can test the entire flow in automated manner before going live?
> Can there be an option of reading flow xml file into a separate test
> harness and execute specific portions of the flow end to end?
> Can minifi used for this purpose? I do understand that minifi is created
> for different purpose but can it also be used for some form of automated
> testing.
>
> 5) how does the debug flow processor work? Could not find enough
> documentation there?
>
> 7) Is there a detailed infrastructure recommendation and setup guide for
> NiFi cluster? Like best practises and some sample setup patterns.
>
> Thanking in advance.
>
> Thanks
>