You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by "Pierre Villard (Jira)" <ji...@apache.org> on 2020/06/02 07:02:00 UTC

[jira] [Commented] (NIFI-7501) Generate Flowfile does not scale

    [ https://issues.apache.org/jira/browse/NIFI-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123407#comment-17123407 ] 

Pierre Villard commented on NIFI-7501:
--------------------------------------

If you have multiple success relationships, it does not generate more data in the content repository, it'll create just one single file and the "pointer" to this file in the content repository will be in the X flow files (X being the number of success relationships). In your use case, the correct approach, I believe, would be to use GenerateFlowFile -> DuplicateFlowFile -> ....

> Generate Flowfile does not scale
> --------------------------------
>
>                 Key: NIFI-7501
>                 URL: https://issues.apache.org/jira/browse/NIFI-7501
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.11.4
>            Reporter: Dennis Jaheruddin
>            Priority: Minor
>         Attachments: generationperformance.xml
>
>
> One of the purposes of Generate Flowfile is load testing. However, unfortunately it often appears to become the bottleneck itself. I have found it not to scale well.
> Example result from my laptop:
> I want to generate messages and bring them to a single processor, lets call it processor X.
> With 1 concurrent task, and a batch size of 1, and a message size of 10MB and uniqueness false it can generate approximately 2 GB/sec.
> When allowing for more concurrent tasks, or a larger batch size, no noticeable change is found.
> However, if instead of increasing the batchsize I route the success relationship to multiple processors that do 'nothing' (like updateattribute), and then bring the success relations of all these to processor X, I can get much more than 2 GB/sec. 
>  
> In conclusion: I don't appear to be hitting a hardware limit as I am able to generate the number of messages in this inelegant way, but no matter how I set up my generateflowfile processor, it just will not scale. Suggesting there may be a smarter way to generate data when uniqueness is not required.
>  
> I have attached a template to illustrate my findings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)