You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hamish Whittal <ha...@cloud-fundis.co.za> on 2020/08/12 12:11:25 UTC

Spark Streaming with Kafka and Python

Hi folks,

Thought I would ask here because it's somewhat confusing. I'm using Spark
2.4.5 on EMR 5.30.1 with Amazon MSK.

The version of Scala used is 2.11.12. I'm using this version of the
libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar

Now I'm wanting to read from Kafka topics using Python (I need to stick to
Python specifically).

What seems confusing is that 0.8 has Python support, but 0.10 does not.
Then 0.8 seems to have been deprecated as of Spark 2.3.0, so if I'm using
2.4.5 then clearly I'm going to hit a roadblock here.

Can someone clarify these things for me? Have I got this right?

Thanks in advance,
Hamish

Re: Spark Streaming with Kafka and Python

Posted by Sean Owen <sr...@gmail.com>.
What supports Python in (Kafka?) 0.8? I don't think Spark ever had a
specific Python-Kafka integration. But you have always been able to
use it to read DataFrames as in Structured Streaming.
Kafka 0.8 support is deprecated (gone in 3.0) but 0.10 means 0.10+ -
works with the latest 2.x.
What is the issue?

On Wed, Aug 12, 2020 at 7:53 AM German Schiavon
<gs...@gmail.com> wrote:
>
> Hey,
>
> Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming?
>
> https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html
>
> Regards
>
> On Wed, 12 Aug 2020 at 14:12, Hamish Whittal <ha...@cloud-fundis.co.za> wrote:
>>
>> Hi folks,
>>
>> Thought I would ask here because it's somewhat confusing. I'm using Spark 2.4.5 on EMR 5.30.1 with Amazon MSK.
>>
>> The version of Scala used is 2.11.12. I'm using this version of the libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar
>>
>> Now I'm wanting to read from Kafka topics using Python (I need to stick to Python specifically).
>>
>> What seems confusing is that 0.8 has Python support, but 0.10 does not. Then 0.8 seems to have been deprecated as of Spark 2.3.0, so if I'm using 2.4.5 then clearly I'm going to hit a roadblock here.
>>
>> Can someone clarify these things for me? Have I got this right?
>>
>> Thanks in advance,
>> Hamish

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark Streaming with Kafka and Python

Posted by German Schiavon <gs...@gmail.com>.
Hey,

Maybe I'm missing some restriction with EMR, but have you tried to use
Structured Streaming instead of Spark Streaming?

https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html

Regards

On Wed, 12 Aug 2020 at 14:12, Hamish Whittal <ha...@cloud-fundis.co.za>
wrote:

> Hi folks,
>
> Thought I would ask here because it's somewhat confusing. I'm using Spark
> 2.4.5 on EMR 5.30.1 with Amazon MSK.
>
> The version of Scala used is 2.11.12. I'm using this version of the
> libraries spark-streaming-kafka-0-8_2.11-2.4.5.jar
>
> Now I'm wanting to read from Kafka topics using Python (I need to stick to
> Python specifically).
>
> What seems confusing is that 0.8 has Python support, but 0.10 does not.
> Then 0.8 seems to have been deprecated as of Spark 2.3.0, so if I'm using
> 2.4.5 then clearly I'm going to hit a roadblock here.
>
> Can someone clarify these things for me? Have I got this right?
>
> Thanks in advance,
> Hamish
>