You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Kishore Nallan (JIRA)" <ji...@apache.org> on 2015/12/12 09:36:46 UTC

[jira] [Updated] (SAMZA-839) KafkaSystemProducer should use the same partitioning hash function as Kafka's producer

     [ https://issues.apache.org/jira/browse/SAMZA-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kishore Nallan updated SAMZA-839:
---------------------------------
    Description: 
Samza's KafkaSystemProducer class generates the partition key using:

{{abs(envelope.getPartitionKey.hashCode()) % numPartitions}}

However, Kafka's producer generates the partition key this way:

{{Utils.abs(Utils.murmur2(record.key())) % numPartitions}}

This makes it difficult for me to join 2 data sources on a common key when one source is produced by Samza and the other by a default Kafka producer.

As a work-around, I have to modify the upstream job (that uses the default kafka producer) to write with an explicit partition key using Samza's hashing logic. 

  was:
Samza's KafkaSystemProducer class generates the partition key using:

abs(envelope.getPartitionKey.hashCode()) % numPartitions

However, Kafka's producer generates the partition key this way:

Utils.abs(Utils.murmur2(record.key())) % numPartitions

This makes it difficult for me to join 2 data sources on a common key when one source is produced by Samza and the other by a default Kafka producer.

As a work-around, I have to modify the upstream job (that uses the default kafka producer) to write with an explicit partition key using Samza's hashing logic. 


> KafkaSystemProducer should use the same partitioning hash function as Kafka's producer
> --------------------------------------------------------------------------------------
>
>                 Key: SAMZA-839
>                 URL: https://issues.apache.org/jira/browse/SAMZA-839
>             Project: Samza
>          Issue Type: Bug
>          Components: kafka
>    Affects Versions: 0.9.1
>            Reporter: Kishore Nallan
>
> Samza's KafkaSystemProducer class generates the partition key using:
> {{abs(envelope.getPartitionKey.hashCode()) % numPartitions}}
> However, Kafka's producer generates the partition key this way:
> {{Utils.abs(Utils.murmur2(record.key())) % numPartitions}}
> This makes it difficult for me to join 2 data sources on a common key when one source is produced by Samza and the other by a default Kafka producer.
> As a work-around, I have to modify the upstream job (that uses the default kafka producer) to write with an explicit partition key using Samza's hashing logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)