You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by "Joseph Witt (JIRA)" <ji...@apache.org> on 2017/12/06 19:57:00 UTC

[jira] [Updated] (NIFI-4675) PublishKafka_0_10 can't use demarcator and kafka key at the same time

     [ https://issues.apache.org/jira/browse/NIFI-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joseph Witt updated NIFI-4675:
------------------------------
    Fix Version/s:     (was: 1.5.0)

> PublishKafka_0_10 can't use demarcator and kafka key at the same time
> ---------------------------------------------------------------------
>
>                 Key: NIFI-4675
>                 URL: https://issues.apache.org/jira/browse/NIFI-4675
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Jasper Knulst
>              Labels: performance
>
> At the moment you can't split up a flowfile using a demarcator AND set the Kafka key (kafka.key) attribute for all resulting Kafka records at the same time. The code explicitly prevents this.
> Still it would be a valuable performance booster to have the ability to use both at the same time in all cases where 1 flowfile contains many individual kafka records. Flowfiles would not have to be pre split (explosion of NiFi overhead) if you want to set the key. 
> Note:
> Using demarcator and kafka key at the same time will normally make every resulting kafka record from 1 incoming flowfile to have the same kafka key (see REMARK).
> I know a live NiFi deployment where this fix/feature (provided as custom fix) led to a 500 - 600% increase in throughput. Others could and should benefit as well.
> REMARK
> The argument against this feature has been that it is not a good idea to intentionally generate many duplicate Kafka keys. I would argue that it is up to the user to decide. Most would use Kafka as a pure distributed log system and key uniqueness is not important. The kafka key can be really valuable grouping placeholder though. The only case where this would get problematic is on  compaction of Kafka topics when kafka keys are deduplicated. But after we put sufficient warnings and disclaimers for this risk in the tooltips it is up to the user to decide whether to use the performance booster.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)