You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by aakash aakash <em...@gmail.com> on 2016/11/15 18:58:34 UTC

Fwd: using Spark Streaming with Kafka 0.9/0.10

Re-posting it at dev group.

Thanks and Regards,
Aakash


---------- Forwarded message ----------
From: aakash aakash <em...@gmail.com>
Date: Mon, Nov 14, 2016 at 4:10 PM
Subject: using Spark Streaming with Kafka 0.9/0.10
To: user-subscribe@spark.apache.org


Hi,

I am planning to use Spark Streaming to consume messages from Kafka 0.9. I
have couple of questions regarding this :


   - I see APIs are annotated with @Experimental. So can you please tell me
   when are we planning to make it production ready ?
   - Currently, I see we are using Kafka 0.10 and so curious to know why
   not we started with 0.9 Kafka instead of 0.10 Kafka. As I see 0.10 kafka
   client would not be compatible with 0.9 client since there are some changes
   in arguments in consumer API.
   - Current API extends InputDstream and as per document it means RDD will
   be generated by running a service/thread only on the driver node instead of
   worker node. Can you please explain to me why we are doing this and what is
   required to make sure that it runs on worker node.


Thanks in advance !

Regards,
Aakash

Re: using Spark Streaming with Kafka 0.9/0.10

Posted by aakash aakash <em...@gmail.com>.

Thanks for the link and info Cody !


Regards,
Aakash


On Tue, Nov 15, 2016 at 7:47 PM, Cody Koeninger <co...@koeninger.org> wrote:

> Generating / defining an RDDis not the same thing as running the
> compute() method of an rdd .  The direct stream definitely runs kafka
> consumers on the executors.
>
> If you want more info, the blog post and video linked from
> https://github.com/koeninger/kafka-exactly-once refers to the 0.8
> implementation, but the general design is similar for the 0.10
> version.
>
> I think the likelihood of an official release supporting 0.9 is fairly
> slim at this point, it's a year out of date and wouldn't be a drop-in
> dependency change.
>
>
> On Tue, Nov 15, 2016 at 5:50 PM, aakash aakash <em...@gmail.com>
> wrote:
> >
> >
> >> You can use the 0.8 artifact to consume from a 0.9 broker
> >
> > We are currently using "Camus" in production and one of the main goal to
> > move to Spark is to use new Kafka Consumer API  of Kafka 0.9 and in our
> case
> > we need the security provisions available in 0.9, that why we cannot use
> 0.8
> > client.
> >
> >> Where are you reading documentation indicating that the direct stream
> > only runs on the driver?
> >
> > I might be wrong here, but I see that new kafka+Spark stream code extend
> the
> > InputStream and its documentation says : Input streams that can generate
> > RDDs from new data by running a service/thread only on the driver node
> (that
> > is, without running a receiver on worker nodes)
> >
> > Thanks and regards,
> > Aakash Pradeep
> >
> >
> > On Tue, Nov 15, 2016 at 2:55 PM, Cody Koeninger <co...@koeninger.org>
> wrote:
> >>
> >> It'd probably be worth no longer marking the 0.8 interface as
> >> experimental.  I don't think it's likely to be subject to active
> >> development at this point.
> >>
> >> You can use the 0.8 artifact to consume from a 0.9 broker
> >>
> >> Where are you reading documentation indicating that the direct stream
> >> only runs on the driver?  It runs consumers on the worker nodes.
> >>
> >>
> >> On Tue, Nov 15, 2016 at 10:58 AM, aakash aakash <email2aakash@gmail.com
> >
> >> wrote:
> >> > Re-posting it at dev group.
> >> >
> >> > Thanks and Regards,
> >> > Aakash
> >> >
> >> >
> >> > ---------- Forwarded message ----------
> >> > From: aakash aakash <em...@gmail.com>
> >> > Date: Mon, Nov 14, 2016 at 4:10 PM
> >> > Subject: using Spark Streaming with Kafka 0.9/0.10
> >> > To: user-subscribe@spark.apache.org
> >> >
> >> >
> >> > Hi,
> >> >
> >> > I am planning to use Spark Streaming to consume messages from Kafka
> 0.9.
> >> > I
> >> > have couple of questions regarding this :
> >> >
> >> > I see APIs are annotated with @Experimental. So can you please tell me
> >> > when
> >> > are we planning to make it production ready ?
> >> > Currently, I see we are using Kafka 0.10 and so curious to know why
> not
> >> > we
> >> > started with 0.9 Kafka instead of 0.10 Kafka. As I see 0.10 kafka
> client
> >> > would not be compatible with 0.9 client since there are some changes
> in
> >> > arguments in consumer API.
> >> > Current API extends InputDstream and as per document it means RDD will
> >> > be
> >> > generated by running a service/thread only on the driver node instead
> of
> >> > worker node. Can you please explain to me why we are doing this and
> what
> >> > is
> >> > required to make sure that it runs on worker node.
> >> >
> >> >
> >> > Thanks in advance !
> >> >
> >> > Regards,
> >> > Aakash
> >> >
> >
> >
>

Re: using Spark Streaming with Kafka 0.9/0.10

Posted by Cody Koeninger <co...@koeninger.org>.

Generating / defining an RDDis not the same thing as running the
compute() method of an rdd .  The direct stream definitely runs kafka
consumers on the executors.

If you want more info, the blog post and video linked from
https://github.com/koeninger/kafka-exactly-once refers to the 0.8
implementation, but the general design is similar for the 0.10
version.

I think the likelihood of an official release supporting 0.9 is fairly
slim at this point, it's a year out of date and wouldn't be a drop-in
dependency change.


On Tue, Nov 15, 2016 at 5:50 PM, aakash aakash <em...@gmail.com> wrote:
>
>
>> You can use the 0.8 artifact to consume from a 0.9 broker
>
> We are currently using "Camus" in production and one of the main goal to
> move to Spark is to use new Kafka Consumer API  of Kafka 0.9 and in our case
> we need the security provisions available in 0.9, that why we cannot use 0.8
> client.
>
>> Where are you reading documentation indicating that the direct stream
> only runs on the driver?
>
> I might be wrong here, but I see that new kafka+Spark stream code extend the
> InputStream and its documentation says : Input streams that can generate
> RDDs from new data by running a service/thread only on the driver node (that
> is, without running a receiver on worker nodes)
>
> Thanks and regards,
> Aakash Pradeep
>
>
> On Tue, Nov 15, 2016 at 2:55 PM, Cody Koeninger <co...@koeninger.org> wrote:
>>
>> It'd probably be worth no longer marking the 0.8 interface as
>> experimental.  I don't think it's likely to be subject to active
>> development at this point.
>>
>> You can use the 0.8 artifact to consume from a 0.9 broker
>>
>> Where are you reading documentation indicating that the direct stream
>> only runs on the driver?  It runs consumers on the worker nodes.
>>
>>
>> On Tue, Nov 15, 2016 at 10:58 AM, aakash aakash <em...@gmail.com>
>> wrote:
>> > Re-posting it at dev group.
>> >
>> > Thanks and Regards,
>> > Aakash
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: aakash aakash <em...@gmail.com>
>> > Date: Mon, Nov 14, 2016 at 4:10 PM
>> > Subject: using Spark Streaming with Kafka 0.9/0.10
>> > To: user-subscribe@spark.apache.org
>> >
>> >
>> > Hi,
>> >
>> > I am planning to use Spark Streaming to consume messages from Kafka 0.9.
>> > I
>> > have couple of questions regarding this :
>> >
>> > I see APIs are annotated with @Experimental. So can you please tell me
>> > when
>> > are we planning to make it production ready ?
>> > Currently, I see we are using Kafka 0.10 and so curious to know why not
>> > we
>> > started with 0.9 Kafka instead of 0.10 Kafka. As I see 0.10 kafka client
>> > would not be compatible with 0.9 client since there are some changes in
>> > arguments in consumer API.
>> > Current API extends InputDstream and as per document it means RDD will
>> > be
>> > generated by running a service/thread only on the driver node instead of
>> > worker node. Can you please explain to me why we are doing this and what
>> > is
>> > required to make sure that it runs on worker node.
>> >
>> >
>> > Thanks in advance !
>> >
>> > Regards,
>> > Aakash
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: using Spark Streaming with Kafka 0.9/0.10

Posted by aakash aakash <em...@gmail.com>.

> You can use the 0.8 artifact to consume from a 0.9 broker

We are currently using "Camus
<http://docs.confluent.io/1.0/camus/docs/intro.html>" in production and one
of the main goal to move to Spark is to use new Kafka Consumer API  of
Kafka 0.9 and in our case we need the security provisions available in 0.9,
that why we cannot use 0.8 client.

> Where are you reading documentation indicating that the direct stream
only runs on the driver?

I might be wrong here, but I see that new
<http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html>
kafka+Spark stream code extend the InputStream
<http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.InputDStream>
and its documentation says :

* Input streams that can generate RDDs from new data by running a
service/thread only on the driver node (that is, without running a receiver
on worker nodes) *
Thanks and regards,
Aakash Pradeep


On Tue, Nov 15, 2016 at 2:55 PM, Cody Koeninger <co...@koeninger.org> wrote:

> It'd probably be worth no longer marking the 0.8 interface as
> experimental.  I don't think it's likely to be subject to active
> development at this point.
>
> You can use the 0.8 artifact to consume from a 0.9 broker
>
> Where are you reading documentation indicating that the direct stream
> only runs on the driver?  It runs consumers on the worker nodes.
>
>
> On Tue, Nov 15, 2016 at 10:58 AM, aakash aakash <em...@gmail.com>
> wrote:
> > Re-posting it at dev group.
> >
> > Thanks and Regards,
> > Aakash
> >
> >
> > ---------- Forwarded message ----------
> > From: aakash aakash <em...@gmail.com>
> > Date: Mon, Nov 14, 2016 at 4:10 PM
> > Subject: using Spark Streaming with Kafka 0.9/0.10
> > To: user-subscribe@spark.apache.org
> >
> >
> > Hi,
> >
> > I am planning to use Spark Streaming to consume messages from Kafka 0.9.
> I
> > have couple of questions regarding this :
> >
> > I see APIs are annotated with @Experimental. So can you please tell me
> when
> > are we planning to make it production ready ?
> > Currently, I see we are using Kafka 0.10 and so curious to know why not
> we
> > started with 0.9 Kafka instead of 0.10 Kafka. As I see 0.10 kafka client
> > would not be compatible with 0.9 client since there are some changes in
> > arguments in consumer API.
> > Current API extends InputDstream and as per document it means RDD will be
> > generated by running a service/thread only on the driver node instead of
> > worker node. Can you please explain to me why we are doing this and what
> is
> > required to make sure that it runs on worker node.
> >
> >
> > Thanks in advance !
> >
> > Regards,
> > Aakash
> >
>

Re: using Spark Streaming with Kafka 0.9/0.10

Posted by Cody Koeninger <co...@koeninger.org>.

It'd probably be worth no longer marking the 0.8 interface as
experimental.  I don't think it's likely to be subject to active
development at this point.

You can use the 0.8 artifact to consume from a 0.9 broker

Where are you reading documentation indicating that the direct stream
only runs on the driver?  It runs consumers on the worker nodes.


On Tue, Nov 15, 2016 at 10:58 AM, aakash aakash <em...@gmail.com> wrote:
> Re-posting it at dev group.
>
> Thanks and Regards,
> Aakash
>
>
> ---------- Forwarded message ----------
> From: aakash aakash <em...@gmail.com>
> Date: Mon, Nov 14, 2016 at 4:10 PM
> Subject: using Spark Streaming with Kafka 0.9/0.10
> To: user-subscribe@spark.apache.org
>
>
> Hi,
>
> I am planning to use Spark Streaming to consume messages from Kafka 0.9. I
> have couple of questions regarding this :
>
> I see APIs are annotated with @Experimental. So can you please tell me when
> are we planning to make it production ready ?
> Currently, I see we are using Kafka 0.10 and so curious to know why not we
> started with 0.9 Kafka instead of 0.10 Kafka. As I see 0.10 kafka client
> would not be compatible with 0.9 client since there are some changes in
> arguments in consumer API.
> Current API extends InputDstream and as per document it means RDD will be
> generated by running a service/thread only on the driver node instead of
> worker node. Can you please explain to me why we are doing this and what is
> required to make sure that it runs on worker node.
>
>
> Thanks in advance !
>
> Regards,
> Aakash
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org