You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Eric Chaves <er...@uolet.com> on 2019/04/10 17:27:50 UTC

Advice on orchestrating Nifi with dockerized external services

Hi Folks,

My company is using nifi to perform several data-flow process and now we
received a requirement to do some fairly complex ETL over large files. To
process those files we have some proprietary applications (mostly written
in phyton or go) that ran as docker containers.

I don't think that porting those apps as nifi processors would produce a
good result due to each app complexity.

Also we would like keep using the nifi queues so we can monitor overall
progress as we already do (we ran several other nifi flows) so we are
discarding for now solutions that for example submit files to an external
queue like SQS or Rabbit for consumption.

So far we come up with two solutions that would:

   1. have kubernete cluster of running jobs periodically querying the nifi
   queue for new flowfiles and pull one when a file arrives.
   2. download the file-content (which is already stored outside of nifi)
   and process it.
   3. submit the result back to nifi (using a HTTP Listener processor) to
   trigger subsequent nifi process.


For step 1 and 2 so far we are considering two possible approaches:

A) use a minifi container togheter with the app container in a sidecar
design. minifi would connect to our nifi cluster and handle file download
to a local volume for the app container process them.

B) use nifi rest API to query and consume flowfiles on queue

One requirement is that if needed we would manually scale up the app
cluster to have multiple containers consumer more queued files in parallel.

Do you guys recommend one over another (or a third approach)? Any pitfalls
you can foresee?

Would be really glad to hear your thoughts on this matter.

Best regards,

Eric

Re: Advice on orchestrating Nifi with dockerized external services

Posted by Eric Chaves <er...@uolet.com>.
Hi Koji, that seems a pretty good idea, thanks for bringing it up! I wasn't
aware of nifi nano but definitely will give it a shot. =)

Thanks

Em qua, 10 de abr de 2019 às 22:38, Koji Kawamura <ij...@gmail.com>
escreveu:

> Hi Eric,
>
> Although my knowledge on MiNiFi, Python and Go is limited, I wonder if
> "nanofi" library can be used from the proprietary application so that
> they can fetch FlowFiles directly using Site-to-Site protocol. That
> can be an interesting approach and will be able to eliminate the need
> of storing data to a local volume (mentioned in the possible approach
> A).
> https://github.com/apache/nifi-minifi-cpp/tree/master/nanofi
>
> The latest MiNiFi (C++) version 0.6.0 was released recently.
> https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes
>
> Thanks,
> Koji
>
> On Thu, Apr 11, 2019 at 2:28 AM Eric Chaves <er...@uolet.com> wrote:
> >
> > Hi Folks,
> >
> > My company is using nifi to perform several data-flow process and now we
> received a requirement to do some fairly complex ETL over large files. To
> process those files we have some proprietary applications (mostly written
> in phyton or go) that ran as docker containers.
> >
> > I don't think that porting those apps as nifi processors would produce a
> good result due to each app complexity.
> >
> > Also we would like keep using the nifi queues so we can monitor overall
> progress as we already do (we ran several other nifi flows) so we are
> discarding for now solutions that for example submit files to an external
> queue like SQS or Rabbit for consumption.
> >
> > So far we come up with two solutions that would:
> >
> > have kubernete cluster of running jobs periodically querying the nifi
> queue for new flowfiles and pull one when a file arrives.
> > download the file-content (which is already stored outside of nifi) and
> process it.
> > submit the result back to nifi (using a HTTP Listener processor) to
> trigger subsequent nifi process.
> >
> >
> > For step 1 and 2 so far we are considering two possible approaches:
> >
> > A) use a minifi container togheter with the app container in a sidecar
> design. minifi would connect to our nifi cluster and handle file download
> to a local volume for the app container process them.
> >
> > B) use nifi rest API to query and consume flowfiles on queue
> >
> > One requirement is that if needed we would manually scale up the app
> cluster to have multiple containers consumer more queued files in parallel.
> >
> > Do you guys recommend one over another (or a third approach)? Any
> pitfalls you can foresee?
> >
> > Would be really glad to hear your thoughts on this matter.
> >
> > Best regards,
> >
> > Eric
>

Re: Advice on orchestrating Nifi with dockerized external services

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Eric,

Although my knowledge on MiNiFi, Python and Go is limited, I wonder if
"nanofi" library can be used from the proprietary application so that
they can fetch FlowFiles directly using Site-to-Site protocol. That
can be an interesting approach and will be able to eliminate the need
of storing data to a local volume (mentioned in the possible approach
A).
https://github.com/apache/nifi-minifi-cpp/tree/master/nanofi

The latest MiNiFi (C++) version 0.6.0 was released recently.
https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes

Thanks,
Koji

On Thu, Apr 11, 2019 at 2:28 AM Eric Chaves <er...@uolet.com> wrote:
>
> Hi Folks,
>
> My company is using nifi to perform several data-flow process and now we received a requirement to do some fairly complex ETL over large files. To process those files we have some proprietary applications (mostly written in phyton or go) that ran as docker containers.
>
> I don't think that porting those apps as nifi processors would produce a good result due to each app complexity.
>
> Also we would like keep using the nifi queues so we can monitor overall progress as we already do (we ran several other nifi flows) so we are discarding for now solutions that for example submit files to an external queue like SQS or Rabbit for consumption.
>
> So far we come up with two solutions that would:
>
> have kubernete cluster of running jobs periodically querying the nifi queue for new flowfiles and pull one when a file arrives.
> download the file-content (which is already stored outside of nifi) and process it.
> submit the result back to nifi (using a HTTP Listener processor) to trigger subsequent nifi process.
>
>
> For step 1 and 2 so far we are considering two possible approaches:
>
> A) use a minifi container togheter with the app container in a sidecar design. minifi would connect to our nifi cluster and handle file download to a local volume for the app container process them.
>
> B) use nifi rest API to query and consume flowfiles on queue
>
> One requirement is that if needed we would manually scale up the app cluster to have multiple containers consumer more queued files in parallel.
>
> Do you guys recommend one over another (or a third approach)? Any pitfalls you can foresee?
>
> Would be really glad to hear your thoughts on this matter.
>
> Best regards,
>
> Eric