You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by churly lin <ch...@gmail.com> on 2017/03/13 07:20:40 UTC

The speed of Spark streaming reading data from kafka stays low

HI all:
I am using spark *streaming(1.6.2)* + *kafka(0.10.1.0)*.  to be specific, I
read events from kafka topic by *spark streaming direct approach*.
kafka: *1 topic 10 partitions*.
spark streaming: *10 executors *according to 10 kafka partitions. The*
batch window time* is set 60s.

After running, the spark streaming processing time is about 20s, much less
than the batch window size. but no matter how the input rate of the kafka
producer changed(3000 events/sec, 4000 events/sec, 6000 events/sec), the
input rate of spark streaming(kafka consumer) was always about 3000
events/sec. which means the spark streaming(kafka consumer side) couldn't
catch up with the kafka producer side. So, is there a way to increase the
throughput of the *spark streaming + kafka(direct approach) *system?

I hava tried to increase the kafka partitions from 10 to 20, accordingly,
increase the executors from 10 to 20, but didn't work.





Thanks.

Re: The speed of Spark streaming reading data from kafka stays low

Posted by Lysiane Bouchard <bo...@gmail.com>.

Hi,

If you didn't already, I would recommend to verify the following
configuration properties:
spark.streaming.kafka.maxRatePerPartition
spark.streaming.backpressure.enabled
spark.streaming.receiver.maxRate

See documentation for your Spark Streaming version here
<https://spark.apache.org/docs/1.6.2/configuration.html#spark-streaming> for
more details.

Good luck !

On Mon, Mar 13, 2017 at 3:20 AM, churly lin <ch...@gmail.com> wrote:

> HI all:
> I am using spark *streaming(1.6.2)* + *kafka(0.10.1.0)*.  to be specific,
> I read events from kafka topic by *spark streaming direct approach*.
> kafka: *1 topic 10 partitions*.
> spark streaming: *10 executors *according to 10 kafka partitions. The*
> batch window time* is set 60s.
>
> After running, the spark streaming processing time is about 20s, much less
> than the batch window size. but no matter how the input rate of the kafka
> producer changed(3000 events/sec, 4000 events/sec, 6000 events/sec), the
> input rate of spark streaming(kafka consumer) was always about 3000
> events/sec. which means the spark streaming(kafka consumer side) couldn't
> catch up with the kafka producer side. So, is there a way to increase the
> throughput of the *spark streaming + kafka(direct approach) *system?
>
> I hava tried to increase the kafka partitions from 10 to 20, accordingly,
> increase the executors from 10 to 20, but didn't work.
>
> 
>
> 
>
> Thanks.
>
>