You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Hoang Bao Thien <hb...@gmail.com> on 2016/11/17 09:45:10 UTC

Kafka gives very few number of messages

Hello,

I would like to ask you a question related to the size of Kafka stream. I
want to put a large file data (e.g., file *.csv) to Kafka then use Spark
streaming to get the output from Kafka.
The file size is about 100MB with ~250K messages/rows (Each row has about
10 fields of integer).
I see that Spark Streaming has received  the first two partitions/batches
with large number of messages, (the first is of 60K messages and the second
is of 50K msgs). But from the third batch to the rest, Spark just received
exactly 200 messages per batch (or partition). This is so few compared to
the first batches.
In addition, when I put other files to Kafka, all of the batches contain
exactly 200 msg like the third batch of the first file.


I think that this problem is coming from Kafka or some configuration in
Spark. I already tried to configure with the setting
"auto.offset.reset=largest", but the problem is not resolved, I always get
only 200msg/batch.

I hope that you can understand my problem. Could anyone tell me how to fix
this problem please?
Thank you so much.

Best regards,
Alex