You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Adam Lamar <ad...@gmail.com> on 2016/07/02 02:34:18 UTC

Re: ListS3 processor question (duplicate files / maintaining state)

ddewaele,

> 2. Sometimes, when syncing files to my S3 buckets, I notice that the ListS3
> processor is picking up the same file twice. Is there a way to avoid that ?

Joe's response is correct. If you upload an object to S3 that
overwrites an existing key, the modified date will change, and ListS3
will emit a flowfile for the "new" object with the same key. Likewise,
changes such as object metadata, setting server-side encryption, etc,
will also cause a change to the object modified date. The List->Fetch
strategy works well for a directory being used as queue, but it
doesn't always work as well for monitoring an entire S3 bucket over
time.

You may be able to achieve finer grained control using event
notifications and an SQS queue, which I wrote about a while back:
https://adamlamar.github.io/2016-01-30-monitoring-an-s3-bucket-in-apache-nifi/

I suspect this will function a bit closer to your expectations and the
latency from object creation to NiFi receiving the event should be
much shorter as well.

Hope that helps,
Adam

Re: ListS3 processor question (duplicate files / maintaining state)

Posted by dmilan77 <mi...@gmail.com>.
I opened a ticket:
https://issues.apache.org/jira/browse/NIFI-4715

*
Root cause is: *
When the file gets uploaded to S3 simultaneously  when List S3 is in
progress.
onTrigger-->  maxTimestamp is initiated as 0L.
This is clearing keys as per the code below

When lastModifiedTime on S3 object is same as currentTimestamp for the
listed key it should be skipped. As the key is cleared, it is loading the
same file again. 
I think fix should be to initiate the maxTimestamp with currentTimestamp not
0L.
{code}
 long maxTimestamp = currentTimestamp;
{code}

Following block is clearing keys.
{code:title=org.apache.nifi.processors.aws.s3.ListS3.java|borderStyle=solid}
 if (lastModified > maxTimestamp) {
                    maxTimestamp = lastModified;
                    currentKeys.clear();
                    getLogger().debug("clearing keys");
                }
{code}




--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/