You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Massimiliano Tomassi <ma...@gmail.com> on 2014/09/02 19:12:50 UTC

Publishing a transformed DStream to Kafka

Hello all,
after having applied several transformations to a DStream I'd like to
publish all the elements in all the resulting RDDs to Kafka. What the best
way to do that would be? Just using DStream.foreach and then RDD.foreach ?
Is there any other built in utility for this use case?

Thanks a lot,
Max

-- 
------------------------------------------------
Massimiliano Tomassi
------------------------------------------------
e-mail: max.tomassi@gmail.com
------------------------------------------------

Re: Publishing a transformed DStream to Kafka

Posted by fr...@typesafe.com.
How about writing to a buffer ? Then you would flush the buffer to Kafka if and only if the output operation reports successful completion. In the event of a worker failure, that would not happen.



—
FG

On Sun, Nov 30, 2014 at 2:28 PM, Josh J <jo...@gmail.com> wrote:

> Is there a way to do this that preserves exactly once semantics for the
> write to Kafka?
> On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith <se...@gmail.com> wrote:
>> I'd be interested in finding the answer too. Right now, I do:
>>
>> val kafkaOutMsgs = kafkInMessages.map(x=>myFunc(x._2,someParam))
>> kafkaOutMsgs.foreachRDD((rdd,time) => { rdd.foreach(rec => {
>> writer.output(rec) }) } ) //where writer.ouput is a method that takes a
>> string and writer is an instance of a producer class.
>>
>>
>>
>>
>>
>> On Tue, Sep 2, 2014 at 10:12 AM, Massimiliano Tomassi <
>> max.tomassi@gmail.com> wrote:
>>
>>> Hello all,
>>> after having applied several transformations to a DStream I'd like to
>>> publish all the elements in all the resulting RDDs to Kafka. What the best
>>> way to do that would be? Just using DStream.foreach and then RDD.foreach ?
>>> Is there any other built in utility for this use case?
>>>
>>> Thanks a lot,
>>> Max
>>>
>>> --
>>> ------------------------------------------------
>>> Massimiliano Tomassi
>>> ------------------------------------------------
>>> e-mail: max.tomassi@gmail.com
>>> ------------------------------------------------
>>>
>>
>>

Re: Publishing a transformed DStream to Kafka

Posted by Josh J <jo...@gmail.com>.
Is there a way to do this that preserves exactly once semantics for the
write to Kafka?

On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith <se...@gmail.com> wrote:

> I'd be interested in finding the answer too. Right now, I do:
>
> val kafkaOutMsgs = kafkInMessages.map(x=>myFunc(x._2,someParam))
> kafkaOutMsgs.foreachRDD((rdd,time) => { rdd.foreach(rec => {
> writer.output(rec) }) } ) //where writer.ouput is a method that takes a
> string and writer is an instance of a producer class.
>
>
>
>
>
> On Tue, Sep 2, 2014 at 10:12 AM, Massimiliano Tomassi <
> max.tomassi@gmail.com> wrote:
>
>> Hello all,
>> after having applied several transformations to a DStream I'd like to
>> publish all the elements in all the resulting RDDs to Kafka. What the best
>> way to do that would be? Just using DStream.foreach and then RDD.foreach ?
>> Is there any other built in utility for this use case?
>>
>> Thanks a lot,
>> Max
>>
>> --
>> ------------------------------------------------
>> Massimiliano Tomassi
>> ------------------------------------------------
>> e-mail: max.tomassi@gmail.com
>> ------------------------------------------------
>>
>
>

Re: Publishing a transformed DStream to Kafka

Posted by Tim Smith <se...@gmail.com>.
I'd be interested in finding the answer too. Right now, I do:

val kafkaOutMsgs = kafkInMessages.map(x=>myFunc(x._2,someParam))
kafkaOutMsgs.foreachRDD((rdd,time) => { rdd.foreach(rec => {
writer.output(rec) }) } ) //where writer.ouput is a method that takes a
string and writer is an instance of a producer class.





On Tue, Sep 2, 2014 at 10:12 AM, Massimiliano Tomassi <max.tomassi@gmail.com
> wrote:

> Hello all,
> after having applied several transformations to a DStream I'd like to
> publish all the elements in all the resulting RDDs to Kafka. What the best
> way to do that would be? Just using DStream.foreach and then RDD.foreach ?
> Is there any other built in utility for this use case?
>
> Thanks a lot,
> Max
>
> --
> ------------------------------------------------
> Massimiliano Tomassi
> ------------------------------------------------
> e-mail: max.tomassi@gmail.com
> ------------------------------------------------
>