You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ming Zhao <mz...@gmail.com> on 2015/04/28 20:13:38 UTC

Writing Spark RDDs into Kafka

Hi,

I wonder if anyone has a good example of how to write Spark RDDs into
Kafka.  Specifically, my question is if there is an advantage of sending a
list of messages each time over sending one message at a time.

Sample code for sending one message at a time:

dStream.foreachRDD(rdd => {
  rdd.collect().foreach(myObj => {
    val keyedMessage:KeyedMessage[String, String] = ??? //do something
to convert mObj to KeyedMessage
    producer.send(keyedMessage)
  })
})

Sample code for sending a list of messages:

dStream.foreachRDD(rdd => {
  var messages = new ListBuffer[KeyedMessage[String, String]]()
  rdd.collect().foreach(myObj => {
    val keyedMessage:KeyedMessage[String, String] = ??? //do something
to convert mObj to KeyedMessage
    messages +=  keyedMessage
  })
  producer.send(messages.toList)
})

I noticed in 0.8.2.1, the KafkaProducer class only sends one ProducerRecord
at a time.  I don't know if there is another way to send a list.  So, I am
really interested to know if there are any examples of writing Spark RDDs
into Kafka with the latest Kafka library.

Thanks,

Ming Zhao