You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by David Montgomery <da...@gmail.com> on 2015/07/03 06:50:58 UTC

How to shard and replicate with python and kafka

Hi,

I am using druid that consumes from kafka 8.  Using python-kafka I write to
kafka so 1 server and I have 2 partitions.

Given this setup,

1) How do I shard random? I assume a shard == partition...I hope.. and I
assune the client is responsible for sharding

For Example
rand = random.choice([0,1])
req = ProduceRequest(topic=test_topic,
partition=rand,messages=[create_message(test_payload)])
response = kafka.send_produce_request(payloads=[req], fail_on_error=True)

Is the above best practice?

If I extend to 2 servers with 2 partitions then what does sharding look
like?
In my kafka client:
kafka = KafkaClient('111.111.111.111:9092,222.222.222.222:9092')

So with 2 servers and 2  partitions each does kafka write only to one
partiton out of the 4 total across 2 servers?  Thus the client does a round
robin?


In terms of replication.....what I gather from the docs, this happens on
the broker node, not the python client.  Thus all topics have a rep factor
of 2 using the below.

e.g.

server.properties
# Replication configurations
num.replica.fetchers=2


Thanks