You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/12/13 23:07:00 UTC

[jira] [Commented] (NIFI-4475) Processors that use session.get(batchsize) will yield if multiple inbound connections exist where at least one connection is empty.

    [ https://issues.apache.org/jira/browse/NIFI-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290071#comment-16290071 ] 

ASF GitHub Bot commented on NIFI-4475:
--------------------------------------

GitHub user JPercivall opened a pull request:

    https://github.com/apache/nifi/pull/2337

    NIFI-4475 Changing the get(batchSize) method in StandardProcessSessio…

    …n so that it checks all connections before returning nothing
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [X] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [X] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [X] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [X] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [X] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [X] Have you written or updated unit tests to verify your changes?
    - [X] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [X] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [X] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [X] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [X] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JPercivall/nifi NIFI-4475_Making_session_get_x_round_robin_queues

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2337
    
----
commit 5307d406374c5448d1913a3130459b6724104b10
Author: Joe Percivall <jp...@apache.org>
Date:   2017-12-13T22:17:05Z

    NIFI-4475 Changing the get(batchSize) method in StandardProcessSession so that it checks all connections before returning nothing

----


> Processors that use session.get(batchsize) will yield if multiple inbound connections exist where at least one connection is empty.
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4475
>                 URL: https://issues.apache.org/jira/browse/NIFI-4475
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.3.0
>            Reporter: Matthew Clarke
>              Labels: nifi
>
> There is a difference between how the NiFi framework handles batches of incoming data  (session.get(batchsize)) versus 1 FlowFile (Session.get()) at a time.
> For example PutSyslog does batches and putUDP processes 1 FlowFile at a time.
> With the batch method, a thread is used to poll connection 1 and requests a batch of FlowFiles.  If it gets at least 1 FlowFile, it sends that FlowFile(s) and ends that thread.  On next thread it round-robins to the next connection (Looped failure relationship for example) and requests a batch again.  If that connection is empty, the framework assumes there is no work to do and yields the processor for the configured "yield duration".  So regardless of run schedule, this processor will not run again for the configured yield duration.
> With processors that only work on 1 FlowFile at a time. The thread will round-robin all the inbound connections until it finds a FlowFile.  If it does not find a FlowFile in any connection the framework will yield the processor for the configured yield duration.
> The intent of yield duration is to keep processors with the default runs schedule of 0 sec from using excessive CPU doing nothing; however, in the case of batches it will yield even if FlowFiles exist on another connection.  This can have a huge impact on throughput performance of processors that use session.get(batchsize)
> There are two possible work-arounds to this issue:
> 1. You should see improved performance when multiple inbound connections exist (where any connection may be normally empty) by reducing the configured yield duration. The result is better throughput but at the expense of more CPU usage when all connections are truly empty.
> 2. Only have one inbound connection to processor that work on batches. This can be accomplished by using a funnel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)