You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Jasper Knulst (JIRA)" <ji...@apache.org> on 2017/12/06 19:55:01 UTC

[jira] [Created] (NIFI-4675) PublishKafka_0_10 can't use demarcator and kafka key at the same time

Jasper Knulst created NIFI-4675:
-----------------------------------

             Summary: PublishKafka_0_10 can't use demarcator and kafka key at the same time
                 Key: NIFI-4675
                 URL: https://issues.apache.org/jira/browse/NIFI-4675
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.2.0
            Reporter: Jasper Knulst
             Fix For: 1.5.0


At the moment you can't split up a flowfile using a demarcator AND set the Kafka key (kafka.key) attribute for all resulting Kafka records at the same time. The code explicitly prevents this.

Still it would be a valuable performance booster to have the ability to use both at the same time in all cases where 1 flowfile contains many individual kafka records. Flowfiles would not have to be pre split (explosion of NiFi overhead) if you want to set the key. 

Note:
Using demarcator and kafka key at the same time will normally make every resulting kafka record from 1 incoming flowfile to have the same kafka key (see REMARK).

I know a live NiFi deployment where this fix/feature (provided as custom fix) led to a 500 - 600% increase in throughput. Others could and should benefit as well.

REMARK
The argument against this feature has been that it is not a good idea to intentionally generate many duplicate Kafka keys. I would argue that it is up to the user to decide. Most would use Kafka as a pure distributed log system and key uniqueness is not important. The kafka key can be really valuable grouping placeholder though. The only case where this would get problematic is on  compaction of Kafka topics when kafka keys are deduplicated. But after we put sufficient warnings and disclaimers for this risk in the tooltips it is up to the user to decide whether to use the performance booster.   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)