You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Joseph Witt (JIRA)" <ji...@apache.org> on 2016/12/06 15:01:58 UTC

[jira] [Created] (NIFI-3156) PublishKafka performance without demarcator should be comparable to without

Joseph Witt created NIFI-3156:
---------------------------------

             Summary: PublishKafka performance without demarcator should be comparable to without
                 Key: NIFI-3156
                 URL: https://issues.apache.org/jira/browse/NIFI-3156
             Project: Apache NiFi
          Issue Type: Improvement
            Reporter: Joseph Witt


The PublishKafka processor supports specification of a demarcator property which allows it to scan through the incoming input stream to demarcate messages that it writes to Kafka.  When using this performance is quite reasonable and fast and it makes sense since all items in that bundle are sent as a single interaction with Kafka and the appropriate ack is received.

However, when using that same processor without the demarcator performance is slower and it makes sense because again the bundle is sent as a single interaction with Kafka but in that case it is a single event.

To work around this today one can simply place MergeContent before PublishKafka to bundle some precise amount of data together.  With MergeContent they can specific max number of items to combine together, maximum amount of time to wait before doing so.

We should consider adding support for specifying maximum number of objects to send together in a single interaction with Kafka and thus avoid the need for demarcation/MergeContent preceding this processor.

We need both because in one case we could truly receive bundles of events from an external system and would not want to waste time/resources splitting the data when we could just logically split while sending to Kafka.  And this new property would let the user choose how many, at most, to send at once.

The tradeoff here is the more things you have in a single bundle the more difficult it is or more likely it is that duplicates would be possible.  The interface is favorable to ensuring zero loss but is susceptible to duplication in the presence of failure.  "At-least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)