You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ming Zhao <mz...@gmail.com> on 2015/04/28 20:13:38 UTC
Writing Spark RDDs into Kafka
Hi,
I wonder if anyone has a good example of how to write Spark RDDs into
Kafka. Specifically, my question is if there is an advantage of sending a
list of messages each time over sending one message at a time.
Sample code for sending one message at a time:
dStream.foreachRDD(rdd => {
rdd.collect().foreach(myObj => {
val keyedMessage:KeyedMessage[String, String] = ??? //do something
to convert mObj to KeyedMessage
producer.send(keyedMessage)
})
})
Sample code for sending a list of messages:
dStream.foreachRDD(rdd => {
var messages = new ListBuffer[KeyedMessage[String, String]]()
rdd.collect().foreach(myObj => {
val keyedMessage:KeyedMessage[String, String] = ??? //do something
to convert mObj to KeyedMessage
messages += keyedMessage
})
producer.send(messages.toList)
})
I noticed in 0.8.2.1, the KafkaProducer class only sends one ProducerRecord
at a time. I don't know if there is another way to send a list. So, I am
really interested to know if there are any examples of writing Spark RDDs
into Kafka with the latest Kafka library.
Thanks,
Ming Zhao