You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel van der Ende <da...@gmail.com> on 2017/07/04 14:53:57 UTC

Kafka 0.10 with PySpark

Hi,

I'm working on integrating some pyspark code with Kafka. We'd like to use
SSL/TLS, and so want to use Kafka 0.10. Because structured streaming is
still marked alpha, we'd like to use Spark streaming. On this page,
however, it indicates that the Kafka 0.10 integration in Spark does not
support Python (
https://spark.apache.org/docs/latest/streaming-kafka-integration.html).
I've been trying to figure out why, but have not been able to find
anything. Is there any particular reason for this?

Thanks,

Daniel

Re: Kafka 0.10 with PySpark

Posted by Saisai Shao <sa...@gmail.com>.
Please see the reason in this thread (
https://github.com/apache/spark/pull/14340). It would better to use
structured streaming instead.

So I would like to -1 this patch. I think it's been a mistake to support
> dstream in Python -- yes it satisfies a checkbox and Spark could claim
> there's support for streaming in Python. However, the tooling and maturity
> for working with streaming data (both in Spark and the more broad
> ecosystem) is simply not there. It is a big baggage to maintain, and
> creates a the wrong impression that production streaming jobs can be
> written in Python.
>

On Tue, Jul 4, 2017 at 10:53 PM, Daniel van der Ende <
daniel.vanderende@gmail.com> wrote:

> Hi,
>
> I'm working on integrating some pyspark code with Kafka. We'd like to use
> SSL/TLS, and so want to use Kafka 0.10. Because structured streaming is
> still marked alpha, we'd like to use Spark streaming. On this page,
> however, it indicates that the Kafka 0.10 integration in Spark does not
> support Python (https://spark.apache.org/docs/latest/streaming-kafka-
> integration.html). I've been trying to figure out why, but have not been
> able to find anything. Is there any particular reason for this?
>
> Thanks,
>
> Daniel
>