You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Milan Das (JIRA)" <ji...@apache.org> on 2017/12/20 13:46:00 UTC

[jira] [Comment Edited] (NIFI-4715) ListS3 list duplicate files when incoming file throughput to S3 is high

    [ https://issues.apache.org/jira/browse/NIFI-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298522#comment-16298522 ] 

Milan Das edited comment on NIFI-4715 at 12/20/17 1:45 PM:
-----------------------------------------------------------

Flow Templates & screenshot attached "secureauth/file_107.txt" is duplicate


was (Author: dmilan77):
Flow Templates

> ListS3 list  duplicate files when incoming file throughput to S3 is high
> ------------------------------------------------------------------------
>
>                 Key: NIFI-4715
>                 URL: https://issues.apache.org/jira/browse/NIFI-4715
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>         Environment: All
>            Reporter: Milan Das
>         Attachments: List-S3-dup-issue.xml, screenshot-1.png
>
>
> ListS3 state is implemented using HashSet. HashSet is not thread safe. When ListS3 operates in multi threaded mode, sometimes it  tries to list  same file from S3 bucket.  Seems like HashSet data is getting corrupted.
> currentKeys = new HashSet<>(); // need to be implemented Thread Safe like currentKeys = //ConcurrentHashMap.newKeySet();



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)