You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Rick Braddy <rb...@softnas.com> on 2015/09/01 17:59:37 UTC

Design question - restartable long-running processes

Hi,

I have a Nifi design question. In order to process extremely large files (any size), we intend to create a processor that reads the file in "chunks" and sends as a multi-part FlowFile series, which will avoid using up all available content repository and/or JVM space.

One way would be to create our own state file that contains the latest job information (per thread/job), but that seems very clunky.

The question is, with long-running processes like this that need to be restartable (without starting from the beginning on big files), are there any standard Nifi design patterns we should consider?

Thanks in advance.
Rick

Re: Design question - restartable long-running processes

Posted by Bryan Bende <bb...@gmail.com>.
Rick,

There have been a few requests for a first-class state management feature,
and it is definitely on the community's radar.

Right now, a good example of the current approach would probably be the
ListHDFS processor. It uses a combination of a local state file and the
DistributedMapCache controller service.
In a cluster, ListHDFS would be scheduled to run only on the primary node,
so by utilizing the the DistributedMapCache it allows all nodes in a
cluster to know where to pick up in the event that the primary node of the
cluster is changed.
There are a few other processors that also use the local state file
approach, I believe GetHttp and GetSolr are two of them.

-Bryan


On Tue, Sep 1, 2015 at 11:59 AM, Rick Braddy <rb...@softnas.com> wrote:

> Hi,
>
> I have a Nifi design question. In order to process extremely large files
> (any size), we intend to create a processor that reads the file in "chunks"
> and sends as a multi-part FlowFile series, which will avoid using up all
> available content repository and/or JVM space.
>
> One way would be to create our own state file that contains the latest job
> information (per thread/job), but that seems very clunky.
>
> The question is, with long-running processes like this that need to be
> restartable (without starting from the beginning on big files), are there
> any standard Nifi design patterns we should consider?
>
> Thanks in advance.
> Rick
>