You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Srikrishna Alla <al...@gmail.com> on 2017/01/03 16:38:28 UTC

Kafka Connect Consumer reading messages from Kafka recursively

Hi,

I am using Kafka/Kafka Connect to track certain events happening in my
application. This is how I have implemented it -
1. My application is opening a KafkaProducer every time this event happens
and writes to my topic. My application has several components running in
Yarn and so I did not find a way to have just one producer and reuse it.
Once the event has been published, producer is closed
2. I am using Kafka Connect Sink Connector to consume from my topic and
write to DB and do other processing.

This setup is working great as long as we have a stable number of events
published. The issue I am facing is when we have a huge number of events(in
thousands within minutes) hitting Kafka. In this case, my Sink Connector is
going into a loop and reading events from Kafka recursively and not
stopping. What could have triggered this? Please provide your valuable
insights.

Thanks,
Sri

Re: Kafka Connect Consumer reading messages from Kafka recursively

Posted by Srikrishna Alla <al...@gmail.com>.
Hi Ewen,

My assumption that this issue is happening when a huge number of events are
getting published was wrong. That is when we discovered it. I looked
closely at the log file and I seem to be having this issue for everything.
So, my consumer is reading all the events from Kafka topic and restarting
from the top again. This is happening continuously. I added read/write
permissions for the Kafka client user to __consumer_offsets and restarted
Kafka Connect. I am still facing this issue. Is there anything else I can
check? I did not have this issue when we ran on an unsecured cluster. So, I
keep thinking this has something to do with some permissions issue. Your
help is greatly appreciated.

Thanks,
Sri

On Tue, Jan 3, 2017 at 5:21 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> It's a bit odd (and I just opened a JIRA about it), but you actually need
> read permission for the group and read permission for the topic.
>
> There are some error responses which may only be logged at DEBUG level, but
> I think they should all be throwing an exception and Kafka Connect would
> log that at ERROR level. The only case I can find that doesn't do that is
> if topic authorization failed.
>
> -Ewen
>
> On Tue, Jan 3, 2017 at 2:21 PM, Srikrishna Alla <allasrikrishna1@gmail.com
> >
> wrote:
>
> > Hi Ewen,
> >
> > I did not see any "ERROR" messages in the connect logs. But, I checked
> the
> > __consumer_offsets topic and it doesn't have anything in it. Should I
> > provide write permissions to this topic for my Kafka client user? I am
> > running my consumer using a different user than Kafka user.
> >
> > Thanks,
> > Sri
> >
> > On Tue, Jan 3, 2017 at 3:40 PM, Ewen Cheslack-Postava <ewen@confluent.io
> >
> > wrote:
> >
> > > On Tue, Jan 3, 2017 at 12:58 PM, Srikrishna Alla <
> > > allasrikrishna1@gmail.com>
> > > wrote:
> > >
> > > > Thanks for your response Ewen. I will try to make updates to the
> > producer
> > > > as suggested. Regd the Sink Connector consumer, Could it be that
> > > > connect-offsets topic is not getting updated with the offset
> > information
> > > > per consumer? In that case, will the connector consume the same
> > messages
> > > > again and again? Also, if that is the case, how would I be able to
> > > > troubleshoot? I am running a secured Kafka setup with SASL_PLAINTEXT
> > > setup.
> > > > Which users/groups should have access to write to the default topics?
> > If
> > > > not, please guide me in the right direction.
> > > >
> > > >
> > > For sink connectors we actually don't use the connect-offsets topic. So
> > if
> > > you only have that one sink connector running, you shouldn't expect to
> > see
> > > any writes to it. Since sink connectors are just consumer groups, they
> > use
> > > the existing __consumer_offsets topic for storage and do the commits
> via
> > > the normal consumer commit APIs. For ACLs, you'll want Read access to
> the
> > > Group and the Topic.
> > >
> > > But I doubt it is ACL issues if you're only seeing this when there is
> > heavy
> > > load. You could use the consumer offset checker to see if any offsets
> are
> > > committed for the group. Also, is there anything in the logs that might
> > > indicate a problem with the consumer committing offsets?
> > >
> > > -Ewen
> > >
> > >
> > > > Thanks,
> > > > Sri
> > > >
> > > > On Tue, Jan 3, 2017 at 1:59 PM, Ewen Cheslack-Postava <
> > ewen@confluent.io
> > > >
> > > > wrote:
> > > >
> > > > > On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla <
> > > > allasrikrishna1@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using Kafka/Kafka Connect to track certain events happening
> in
> > > my
> > > > > > application. This is how I have implemented it -
> > > > > > 1. My application is opening a KafkaProducer every time this
> event
> > > > > happens
> > > > > > and writes to my topic. My application has several components
> > running
> > > > in
> > > > > > Yarn and so I did not find a way to have just one producer and
> > reuse
> > > > it.
> > > > > > Once the event has been published, producer is closed
> > > > > >
> > > > >
> > > > > KafkaProducer is thread safe, so you can allocate a single producer
> > per
> > > > > process and use it every time the event occurs on any thread.
> > Creating
> > > > and
> > > > > destroying a producer for every event will be very inefficient --
> not
> > > > only
> > > > > are you opening new TCP connections every time, having to lookup
> > > metadata
> > > > > every time, etc, you also don't allow the producer to get any
> benefit
> > > > from
> > > > > batching so every message will require its own request/response.
> > > > >
> > > > >
> > > > > > 2. I am using Kafka Connect Sink Connector to consume from my
> topic
> > > and
> > > > > > write to DB and do other processing.
> > > > > >
> > > > > > This setup is working great as long as we have a stable number of
> > > > events
> > > > > > published. The issue I am facing is when we have a huge number of
> > > > > events(in
> > > > > > thousands within minutes) hitting Kafka. In this case, my Sink
> > > > Connector
> > > > > is
> > > > > > going into a loop and reading events from Kafka recursively and
> not
> > > > > > stopping. What could have triggered this? Please provide your
> > > valuable
> > > > > > insights.
> > > > > >
> > > > >
> > > > > What exactly do you mean by "reading events from Kafka
> recursively"?
> > > > Unless
> > > > > it's hitting some errors that are causing consumers to fall out of
> > the
> > > > > group uncleanly and then rejoin later, you shouldn't be seeing
> > > > duplicates.
> > > > > Is there anything from the logs that might help reveal the problem?
> > > > >
> > > > > -Ewen
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Sri
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka Connect Consumer reading messages from Kafka recursively

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
It's a bit odd (and I just opened a JIRA about it), but you actually need
read permission for the group and read permission for the topic.

There are some error responses which may only be logged at DEBUG level, but
I think they should all be throwing an exception and Kafka Connect would
log that at ERROR level. The only case I can find that doesn't do that is
if topic authorization failed.

-Ewen

On Tue, Jan 3, 2017 at 2:21 PM, Srikrishna Alla <al...@gmail.com>
wrote:

> Hi Ewen,
>
> I did not see any "ERROR" messages in the connect logs. But, I checked the
> __consumer_offsets topic and it doesn't have anything in it. Should I
> provide write permissions to this topic for my Kafka client user? I am
> running my consumer using a different user than Kafka user.
>
> Thanks,
> Sri
>
> On Tue, Jan 3, 2017 at 3:40 PM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
> > On Tue, Jan 3, 2017 at 12:58 PM, Srikrishna Alla <
> > allasrikrishna1@gmail.com>
> > wrote:
> >
> > > Thanks for your response Ewen. I will try to make updates to the
> producer
> > > as suggested. Regd the Sink Connector consumer, Could it be that
> > > connect-offsets topic is not getting updated with the offset
> information
> > > per consumer? In that case, will the connector consume the same
> messages
> > > again and again? Also, if that is the case, how would I be able to
> > > troubleshoot? I am running a secured Kafka setup with SASL_PLAINTEXT
> > setup.
> > > Which users/groups should have access to write to the default topics?
> If
> > > not, please guide me in the right direction.
> > >
> > >
> > For sink connectors we actually don't use the connect-offsets topic. So
> if
> > you only have that one sink connector running, you shouldn't expect to
> see
> > any writes to it. Since sink connectors are just consumer groups, they
> use
> > the existing __consumer_offsets topic for storage and do the commits via
> > the normal consumer commit APIs. For ACLs, you'll want Read access to the
> > Group and the Topic.
> >
> > But I doubt it is ACL issues if you're only seeing this when there is
> heavy
> > load. You could use the consumer offset checker to see if any offsets are
> > committed for the group. Also, is there anything in the logs that might
> > indicate a problem with the consumer committing offsets?
> >
> > -Ewen
> >
> >
> > > Thanks,
> > > Sri
> > >
> > > On Tue, Jan 3, 2017 at 1:59 PM, Ewen Cheslack-Postava <
> ewen@confluent.io
> > >
> > > wrote:
> > >
> > > > On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla <
> > > allasrikrishna1@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using Kafka/Kafka Connect to track certain events happening in
> > my
> > > > > application. This is how I have implemented it -
> > > > > 1. My application is opening a KafkaProducer every time this event
> > > > happens
> > > > > and writes to my topic. My application has several components
> running
> > > in
> > > > > Yarn and so I did not find a way to have just one producer and
> reuse
> > > it.
> > > > > Once the event has been published, producer is closed
> > > > >
> > > >
> > > > KafkaProducer is thread safe, so you can allocate a single producer
> per
> > > > process and use it every time the event occurs on any thread.
> Creating
> > > and
> > > > destroying a producer for every event will be very inefficient -- not
> > > only
> > > > are you opening new TCP connections every time, having to lookup
> > metadata
> > > > every time, etc, you also don't allow the producer to get any benefit
> > > from
> > > > batching so every message will require its own request/response.
> > > >
> > > >
> > > > > 2. I am using Kafka Connect Sink Connector to consume from my topic
> > and
> > > > > write to DB and do other processing.
> > > > >
> > > > > This setup is working great as long as we have a stable number of
> > > events
> > > > > published. The issue I am facing is when we have a huge number of
> > > > events(in
> > > > > thousands within minutes) hitting Kafka. In this case, my Sink
> > > Connector
> > > > is
> > > > > going into a loop and reading events from Kafka recursively and not
> > > > > stopping. What could have triggered this? Please provide your
> > valuable
> > > > > insights.
> > > > >
> > > >
> > > > What exactly do you mean by "reading events from Kafka recursively"?
> > > Unless
> > > > it's hitting some errors that are causing consumers to fall out of
> the
> > > > group uncleanly and then rejoin later, you shouldn't be seeing
> > > duplicates.
> > > > Is there anything from the logs that might help reveal the problem?
> > > >
> > > > -Ewen
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Sri
> > > > >
> > > >
> > >
> >
>

Re: Kafka Connect Consumer reading messages from Kafka recursively

Posted by Srikrishna Alla <al...@gmail.com>.
Hi Ewen,

I did not see any "ERROR" messages in the connect logs. But, I checked the
__consumer_offsets topic and it doesn't have anything in it. Should I
provide write permissions to this topic for my Kafka client user? I am
running my consumer using a different user than Kafka user.

Thanks,
Sri

On Tue, Jan 3, 2017 at 3:40 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> On Tue, Jan 3, 2017 at 12:58 PM, Srikrishna Alla <
> allasrikrishna1@gmail.com>
> wrote:
>
> > Thanks for your response Ewen. I will try to make updates to the producer
> > as suggested. Regd the Sink Connector consumer, Could it be that
> > connect-offsets topic is not getting updated with the offset information
> > per consumer? In that case, will the connector consume the same messages
> > again and again? Also, if that is the case, how would I be able to
> > troubleshoot? I am running a secured Kafka setup with SASL_PLAINTEXT
> setup.
> > Which users/groups should have access to write to the default topics? If
> > not, please guide me in the right direction.
> >
> >
> For sink connectors we actually don't use the connect-offsets topic. So if
> you only have that one sink connector running, you shouldn't expect to see
> any writes to it. Since sink connectors are just consumer groups, they use
> the existing __consumer_offsets topic for storage and do the commits via
> the normal consumer commit APIs. For ACLs, you'll want Read access to the
> Group and the Topic.
>
> But I doubt it is ACL issues if you're only seeing this when there is heavy
> load. You could use the consumer offset checker to see if any offsets are
> committed for the group. Also, is there anything in the logs that might
> indicate a problem with the consumer committing offsets?
>
> -Ewen
>
>
> > Thanks,
> > Sri
> >
> > On Tue, Jan 3, 2017 at 1:59 PM, Ewen Cheslack-Postava <ewen@confluent.io
> >
> > wrote:
> >
> > > On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla <
> > allasrikrishna1@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using Kafka/Kafka Connect to track certain events happening in
> my
> > > > application. This is how I have implemented it -
> > > > 1. My application is opening a KafkaProducer every time this event
> > > happens
> > > > and writes to my topic. My application has several components running
> > in
> > > > Yarn and so I did not find a way to have just one producer and reuse
> > it.
> > > > Once the event has been published, producer is closed
> > > >
> > >
> > > KafkaProducer is thread safe, so you can allocate a single producer per
> > > process and use it every time the event occurs on any thread. Creating
> > and
> > > destroying a producer for every event will be very inefficient -- not
> > only
> > > are you opening new TCP connections every time, having to lookup
> metadata
> > > every time, etc, you also don't allow the producer to get any benefit
> > from
> > > batching so every message will require its own request/response.
> > >
> > >
> > > > 2. I am using Kafka Connect Sink Connector to consume from my topic
> and
> > > > write to DB and do other processing.
> > > >
> > > > This setup is working great as long as we have a stable number of
> > events
> > > > published. The issue I am facing is when we have a huge number of
> > > events(in
> > > > thousands within minutes) hitting Kafka. In this case, my Sink
> > Connector
> > > is
> > > > going into a loop and reading events from Kafka recursively and not
> > > > stopping. What could have triggered this? Please provide your
> valuable
> > > > insights.
> > > >
> > >
> > > What exactly do you mean by "reading events from Kafka recursively"?
> > Unless
> > > it's hitting some errors that are causing consumers to fall out of the
> > > group uncleanly and then rejoin later, you shouldn't be seeing
> > duplicates.
> > > Is there anything from the logs that might help reveal the problem?
> > >
> > > -Ewen
> > >
> > >
> > > >
> > > > Thanks,
> > > > Sri
> > > >
> > >
> >
>

Re: Kafka Connect Consumer reading messages from Kafka recursively

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
On Tue, Jan 3, 2017 at 12:58 PM, Srikrishna Alla <al...@gmail.com>
wrote:

> Thanks for your response Ewen. I will try to make updates to the producer
> as suggested. Regd the Sink Connector consumer, Could it be that
> connect-offsets topic is not getting updated with the offset information
> per consumer? In that case, will the connector consume the same messages
> again and again? Also, if that is the case, how would I be able to
> troubleshoot? I am running a secured Kafka setup with SASL_PLAINTEXT setup.
> Which users/groups should have access to write to the default topics? If
> not, please guide me in the right direction.
>
>
For sink connectors we actually don't use the connect-offsets topic. So if
you only have that one sink connector running, you shouldn't expect to see
any writes to it. Since sink connectors are just consumer groups, they use
the existing __consumer_offsets topic for storage and do the commits via
the normal consumer commit APIs. For ACLs, you'll want Read access to the
Group and the Topic.

But I doubt it is ACL issues if you're only seeing this when there is heavy
load. You could use the consumer offset checker to see if any offsets are
committed for the group. Also, is there anything in the logs that might
indicate a problem with the consumer committing offsets?

-Ewen


> Thanks,
> Sri
>
> On Tue, Jan 3, 2017 at 1:59 PM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
> > On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla <
> allasrikrishna1@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I am using Kafka/Kafka Connect to track certain events happening in my
> > > application. This is how I have implemented it -
> > > 1. My application is opening a KafkaProducer every time this event
> > happens
> > > and writes to my topic. My application has several components running
> in
> > > Yarn and so I did not find a way to have just one producer and reuse
> it.
> > > Once the event has been published, producer is closed
> > >
> >
> > KafkaProducer is thread safe, so you can allocate a single producer per
> > process and use it every time the event occurs on any thread. Creating
> and
> > destroying a producer for every event will be very inefficient -- not
> only
> > are you opening new TCP connections every time, having to lookup metadata
> > every time, etc, you also don't allow the producer to get any benefit
> from
> > batching so every message will require its own request/response.
> >
> >
> > > 2. I am using Kafka Connect Sink Connector to consume from my topic and
> > > write to DB and do other processing.
> > >
> > > This setup is working great as long as we have a stable number of
> events
> > > published. The issue I am facing is when we have a huge number of
> > events(in
> > > thousands within minutes) hitting Kafka. In this case, my Sink
> Connector
> > is
> > > going into a loop and reading events from Kafka recursively and not
> > > stopping. What could have triggered this? Please provide your valuable
> > > insights.
> > >
> >
> > What exactly do you mean by "reading events from Kafka recursively"?
> Unless
> > it's hitting some errors that are causing consumers to fall out of the
> > group uncleanly and then rejoin later, you shouldn't be seeing
> duplicates.
> > Is there anything from the logs that might help reveal the problem?
> >
> > -Ewen
> >
> >
> > >
> > > Thanks,
> > > Sri
> > >
> >
>

Re: Kafka Connect Consumer reading messages from Kafka recursively

Posted by Srikrishna Alla <al...@gmail.com>.
Thanks for your response Ewen. I will try to make updates to the producer
as suggested. Regd the Sink Connector consumer, Could it be that
connect-offsets topic is not getting updated with the offset information
per consumer? In that case, will the connector consume the same messages
again and again? Also, if that is the case, how would I be able to
troubleshoot? I am running a secured Kafka setup with SASL_PLAINTEXT setup.
Which users/groups should have access to write to the default topics? If
not, please guide me in the right direction.

Thanks,
Sri

On Tue, Jan 3, 2017 at 1:59 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla <allasrikrishna1@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > I am using Kafka/Kafka Connect to track certain events happening in my
> > application. This is how I have implemented it -
> > 1. My application is opening a KafkaProducer every time this event
> happens
> > and writes to my topic. My application has several components running in
> > Yarn and so I did not find a way to have just one producer and reuse it.
> > Once the event has been published, producer is closed
> >
>
> KafkaProducer is thread safe, so you can allocate a single producer per
> process and use it every time the event occurs on any thread. Creating and
> destroying a producer for every event will be very inefficient -- not only
> are you opening new TCP connections every time, having to lookup metadata
> every time, etc, you also don't allow the producer to get any benefit from
> batching so every message will require its own request/response.
>
>
> > 2. I am using Kafka Connect Sink Connector to consume from my topic and
> > write to DB and do other processing.
> >
> > This setup is working great as long as we have a stable number of events
> > published. The issue I am facing is when we have a huge number of
> events(in
> > thousands within minutes) hitting Kafka. In this case, my Sink Connector
> is
> > going into a loop and reading events from Kafka recursively and not
> > stopping. What could have triggered this? Please provide your valuable
> > insights.
> >
>
> What exactly do you mean by "reading events from Kafka recursively"? Unless
> it's hitting some errors that are causing consumers to fall out of the
> group uncleanly and then rejoin later, you shouldn't be seeing duplicates.
> Is there anything from the logs that might help reveal the problem?
>
> -Ewen
>
>
> >
> > Thanks,
> > Sri
> >
>

Re: Kafka Connect Consumer reading messages from Kafka recursively

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
On Tue, Jan 3, 2017 at 8:38 AM, Srikrishna Alla <al...@gmail.com>
wrote:

> Hi,
>
> I am using Kafka/Kafka Connect to track certain events happening in my
> application. This is how I have implemented it -
> 1. My application is opening a KafkaProducer every time this event happens
> and writes to my topic. My application has several components running in
> Yarn and so I did not find a way to have just one producer and reuse it.
> Once the event has been published, producer is closed
>

KafkaProducer is thread safe, so you can allocate a single producer per
process and use it every time the event occurs on any thread. Creating and
destroying a producer for every event will be very inefficient -- not only
are you opening new TCP connections every time, having to lookup metadata
every time, etc, you also don't allow the producer to get any benefit from
batching so every message will require its own request/response.


> 2. I am using Kafka Connect Sink Connector to consume from my topic and
> write to DB and do other processing.
>
> This setup is working great as long as we have a stable number of events
> published. The issue I am facing is when we have a huge number of events(in
> thousands within minutes) hitting Kafka. In this case, my Sink Connector is
> going into a loop and reading events from Kafka recursively and not
> stopping. What could have triggered this? Please provide your valuable
> insights.
>

What exactly do you mean by "reading events from Kafka recursively"? Unless
it's hitting some errors that are causing consumers to fall out of the
group uncleanly and then rejoin later, you shouldn't be seeing duplicates.
Is there anything from the logs that might help reveal the problem?

-Ewen


>
> Thanks,
> Sri
>