You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Eric Ulicny <eu...@umich.edu> on 2018/04/10 17:24:57 UTC

Nifi Parallel Execution

Hello,

We have a use case where we execute processors on all nodes but would like
to use the detect duplicate processor to ensure records are unique. We are
observing that we must run it on one node to truly detect duplicates. Is
there any way to merge flowfiles from all running executors?

-Eric

Re: Nifi Parallel Execution

Posted by Eric Ulicny <eu...@umich.edu>.
We have attempted to use the distributed map cache with the Detect
Duplicate processor as recommended to no avail. The first time two
identical flowfiles are sent simultaneously they both make it through to
the the non-duplicate relationship. After that point they will be
appropriately detected.

In particular we are testing with generate flow file on a two node cluster.
We extract the Shakey and use that when detecting duplicates.

-Eric




On Tue, Apr 10, 2018, 1:32 PM Bryan Bende <bb...@gmail.com> wrote:

> Hello,
>
> DetectDuplicate uses a DistributedMapCacheClientService which would be
> connecting to a DistributedMapCacheServer on one of your nodes.
>
> So all nodes should be connecting to the same cache server which is
> where the information about previously seen data is stored.
>
> -Bryan
>
> On Tue, Apr 10, 2018 at 1:24 PM, Eric Ulicny <eu...@umich.edu> wrote:
> > Hello,
> >
> > We have a use case where we execute processors on all nodes but would
> like
> > to use the detect duplicate processor to ensure records are unique. We
> are
> > observing that we must run it on one node to truly detect duplicates. Is
> > there any way to merge flowfiles from all running executors?
> >
> > -Eric
>

Re: Nifi Parallel Execution

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

DetectDuplicate uses a DistributedMapCacheClientService which would be
connecting to a DistributedMapCacheServer on one of your nodes.

So all nodes should be connecting to the same cache server which is
where the information about previously seen data is stored.

-Bryan

On Tue, Apr 10, 2018 at 1:24 PM, Eric Ulicny <eu...@umich.edu> wrote:
> Hello,
>
> We have a use case where we execute processors on all nodes but would like
> to use the detect duplicate processor to ensure records are unique. We are
> observing that we must run it on one node to truly detect duplicates. Is
> there any way to merge flowfiles from all running executors?
>
> -Eric