You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jeremy Pemberton-Pigott <fu...@gmail.com> on 2021/02/15 03:45:25 UTC

Detect duplicate record reader

Hi everyone, I'm wondering if there is a Detect Duplicate processor that
can read records from a flow file and as output gives just the
non-duplicates (can be single records or a group of non-duplicates would be
better).  I want to use a record reader to avoid splitting the json content
into 10000s of flow files to detect the duplicates.  Immediately after this
flow is a record reader/writer going to HBase.

Jeremy

Re: Detect duplicate record reader

Posted by Jeremy Pemberton-Pigott <fu...@gmail.com>.
Thanks for the replies guys.  Yes, NIFI-6047 is pretty much exactly what
I'm looking for.

Jeremy

On Mon, Feb 15, 2021 at 2:37 PM Chris Sampson <ch...@naimuri.com>
wrote:

> NIFI-6047 [1] is possibly what you're after, but that won't help you just
> now because it appears to remain unfinished.
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-6047
>
> Cheers,
>
> Chris Sampson
>
> On Mon, 15 Feb 2021, 06:27 Jorge Machado, <jo...@me.com> wrote:
>
>> Hey Jeremy,
>>
>> Something linke this
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.DetectDuplicate/index.html
>>  ?
>>
>>
>> On 15. Feb 2021, at 04:45, Jeremy Pemberton-Pigott <fu...@gmail.com>
>> wrote:
>>
>> Hi everyone, I'm wondering if there is a Detect Duplicate processor that
>> can read records from a flow file and as output gives just the
>> non-duplicates (can be single records or a group of non-duplicates would be
>> better).  I want to use a record reader to avoid splitting the json content
>> into 10000s of flow files to detect the duplicates.  Immediately after this
>> flow is a record reader/writer going to HBase.
>>
>> Jeremy
>>
>>
>>

Re: Detect duplicate record reader

Posted by Chris Sampson <ch...@naimuri.com>.
NIFI-6047 [1] is possibly what you're after, but that won't help you just
now because it appears to remain unfinished.


[1] https://issues.apache.org/jira/browse/NIFI-6047

Cheers,

Chris Sampson

On Mon, 15 Feb 2021, 06:27 Jorge Machado, <jo...@me.com> wrote:

> Hey Jeremy,
>
> Something linke this
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.DetectDuplicate/index.html
>  ?
>
>
> On 15. Feb 2021, at 04:45, Jeremy Pemberton-Pigott <fu...@gmail.com>
> wrote:
>
> Hi everyone, I'm wondering if there is a Detect Duplicate processor that
> can read records from a flow file and as output gives just the
> non-duplicates (can be single records or a group of non-duplicates would be
> better).  I want to use a record reader to avoid splitting the json content
> into 10000s of flow files to detect the duplicates.  Immediately after this
> flow is a record reader/writer going to HBase.
>
> Jeremy
>
>
>

Re: Detect duplicate record reader

Posted by Jorge Machado <jo...@me.com>.
Hey Jeremy, 

Something linke this https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.DetectDuplicate/index.html <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.DetectDuplicate/index.html> ? 


> On 15. Feb 2021, at 04:45, Jeremy Pemberton-Pigott <fu...@gmail.com> wrote:
> 
> Hi everyone, I'm wondering if there is a Detect Duplicate processor that can read records from a flow file and as output gives just the non-duplicates (can be single records or a group of non-duplicates would be better).  I want to use a record reader to avoid splitting the json content into 10000s of flow files to detect the duplicates.  Immediately after this flow is a record reader/writer going to HBase.
> 
> Jeremy