You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by BYEONG-GI KIM <bg...@bluedigm.com> on 2016/06/01 00:56:59 UTC

Scalability of Kafka Consumer 0.9.0.1

Hello.

I've implemented a Kafka Consumer Application which consume large number of
monitoring data from Kafka Broker and analyze those data accordingly.

I referred to a guide,
http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client,
since I thought the app needs to implement multi-threading for Kafka
Consumer per Topic. Actually, A topic is assigned to each open-source
monitoring software, e.g., Nagios, Collectd, etc., in order to distinguish
those because each of these uses its own message format, such as JSON,
String, and so on.

There was, however, an Exception even though my source code for the Kafka
Consumer are mostly copied and pasted from the guide;
*java.util.ConcurrentModificationException:
KafkaConsumer is not safe for multi-threaded access*

First Question. Could the implementation in the guide really prevent the
Exception?

And second Question is, could the KafkaConsumer support such huge amount of
data with one thread? The KafkaConsumer seems not thread-safe, and it can
subscribe multi-topics at once. Do I need to change the implementation from
the multi-threaded to one-thread and subscribing multi-topics?... I'm just
wonder whether a KafkaConsumer is able to stand the bunch of data without
performance degradation.

Thanks in advance!

Best regards

KIM

Re: Scalability of Kafka Consumer 0.9.0.1

Posted by BYEONG-GI KIM <bg...@bluedigm.com>.
Hmm, then is it doable assigning non-overlapped different topics to each
thread while implementing the Kafka Consumer with multi-threading?

2016-06-01 22:14 GMT+09:00 Christian Posta <ch...@gmail.com>:

> Gerard is correct.
>
> The unit of parallelization in kafka is the topic and topic partition. A
> single thread/consumer consumes each partition in a topic (even if multiple
> topics). KafkaConsumer is NOT thread safe and should not be shared between
> threads.
>
> On Wed, Jun 1, 2016 at 12:11 AM, Gerard Klijs <ge...@dizzit.com>
> wrote:
>
> > If I understand it correctly each consumer should have it's 'own' thread,
> > and should not be accessible from other threads. But you could
> > (dynamically) create enough threads to cover all the partitions, so each
> > consumer only reads from one partition. You could also let all those
> > consumers access some threadsafe object if you need to combine the
> result.
> > In your linked example the consumers just each do there part, with solves
> > the multi-threaded issue, but when you want to combine data from
> different
> > consumer threads it becomes more tricky.
> >
> > On Wed, Jun 1, 2016 at 2:57 AM BYEONG-GI KIM <bg...@bluedigm.com> wrote:
> >
> > > Hello.
> > >
> > > I've implemented a Kafka Consumer Application which consume large
> number
> > of
> > > monitoring data from Kafka Broker and analyze those data accordingly.
> > >
> > > I referred to a guide,
> > >
> > >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > > ,
> > > since I thought the app needs to implement multi-threading for Kafka
> > > Consumer per Topic. Actually, A topic is assigned to each open-source
> > > monitoring software, e.g., Nagios, Collectd, etc., in order to
> > distinguish
> > > those because each of these uses its own message format, such as JSON,
> > > String, and so on.
> > >
> > > There was, however, an Exception even though my source code for the
> Kafka
> > > Consumer are mostly copied and pasted from the guide;
> > > *java.util.ConcurrentModificationException:
> > > KafkaConsumer is not safe for multi-threaded access*
> > >
> > > First Question. Could the implementation in the guide really prevent
> the
> > > Exception?
> > >
> > > And second Question is, could the KafkaConsumer support such huge
> amount
> > of
> > > data with one thread? The KafkaConsumer seems not thread-safe, and it
> can
> > > subscribe multi-topics at once. Do I need to change the implementation
> > from
> > > the multi-threaded to one-thread and subscribing multi-topics?... I'm
> > just
> > > wonder whether a KafkaConsumer is able to stand the bunch of data
> without
> > > performance degradation.
> > >
> > > Thanks in advance!
> > >
> > > Best regards
> > >
> > > KIM
> > >
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>

Re: Scalability of Kafka Consumer 0.9.0.1

Posted by Christian Posta <ch...@gmail.com>.
Gerard is correct.

The unit of parallelization in kafka is the topic and topic partition. A
single thread/consumer consumes each partition in a topic (even if multiple
topics). KafkaConsumer is NOT thread safe and should not be shared between
threads.

On Wed, Jun 1, 2016 at 12:11 AM, Gerard Klijs <ge...@dizzit.com>
wrote:

> If I understand it correctly each consumer should have it's 'own' thread,
> and should not be accessible from other threads. But you could
> (dynamically) create enough threads to cover all the partitions, so each
> consumer only reads from one partition. You could also let all those
> consumers access some threadsafe object if you need to combine the result.
> In your linked example the consumers just each do there part, with solves
> the multi-threaded issue, but when you want to combine data from different
> consumer threads it becomes more tricky.
>
> On Wed, Jun 1, 2016 at 2:57 AM BYEONG-GI KIM <bg...@bluedigm.com> wrote:
>
> > Hello.
> >
> > I've implemented a Kafka Consumer Application which consume large number
> of
> > monitoring data from Kafka Broker and analyze those data accordingly.
> >
> > I referred to a guide,
> >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > ,
> > since I thought the app needs to implement multi-threading for Kafka
> > Consumer per Topic. Actually, A topic is assigned to each open-source
> > monitoring software, e.g., Nagios, Collectd, etc., in order to
> distinguish
> > those because each of these uses its own message format, such as JSON,
> > String, and so on.
> >
> > There was, however, an Exception even though my source code for the Kafka
> > Consumer are mostly copied and pasted from the guide;
> > *java.util.ConcurrentModificationException:
> > KafkaConsumer is not safe for multi-threaded access*
> >
> > First Question. Could the implementation in the guide really prevent the
> > Exception?
> >
> > And second Question is, could the KafkaConsumer support such huge amount
> of
> > data with one thread? The KafkaConsumer seems not thread-safe, and it can
> > subscribe multi-topics at once. Do I need to change the implementation
> from
> > the multi-threaded to one-thread and subscribing multi-topics?... I'm
> just
> > wonder whether a KafkaConsumer is able to stand the bunch of data without
> > performance degradation.
> >
> > Thanks in advance!
> >
> > Best regards
> >
> > KIM
> >
>



-- 
*Christian Posta*
twitter: @christianposta
http://www.christianposta.com/blog
http://fabric8.io

Re: Scalability of Kafka Consumer 0.9.0.1

Posted by Gerard Klijs <ge...@dizzit.com>.
If I understand it correctly each consumer should have it's 'own' thread,
and should not be accessible from other threads. But you could
(dynamically) create enough threads to cover all the partitions, so each
consumer only reads from one partition. You could also let all those
consumers access some threadsafe object if you need to combine the result.
In your linked example the consumers just each do there part, with solves
the multi-threaded issue, but when you want to combine data from different
consumer threads it becomes more tricky.

On Wed, Jun 1, 2016 at 2:57 AM BYEONG-GI KIM <bg...@bluedigm.com> wrote:

> Hello.
>
> I've implemented a Kafka Consumer Application which consume large number of
> monitoring data from Kafka Broker and analyze those data accordingly.
>
> I referred to a guide,
>
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> ,
> since I thought the app needs to implement multi-threading for Kafka
> Consumer per Topic. Actually, A topic is assigned to each open-source
> monitoring software, e.g., Nagios, Collectd, etc., in order to distinguish
> those because each of these uses its own message format, such as JSON,
> String, and so on.
>
> There was, however, an Exception even though my source code for the Kafka
> Consumer are mostly copied and pasted from the guide;
> *java.util.ConcurrentModificationException:
> KafkaConsumer is not safe for multi-threaded access*
>
> First Question. Could the implementation in the guide really prevent the
> Exception?
>
> And second Question is, could the KafkaConsumer support such huge amount of
> data with one thread? The KafkaConsumer seems not thread-safe, and it can
> subscribe multi-topics at once. Do I need to change the implementation from
> the multi-threaded to one-thread and subscribing multi-topics?... I'm just
> wonder whether a KafkaConsumer is able to stand the bunch of data without
> performance degradation.
>
> Thanks in advance!
>
> Best regards
>
> KIM
>