You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dhawan Gajendran <dh...@datavisor.com> on 2017/10/17 06:30:02 UTC

Kafka connect rebalance problem

Hi All,

We had done a POC with kafka s3 connect confluent platform and have been
trying to move the setup to production. In the current setup we have 7
kafka topics and each topic drains the messages to a s3 bucket using s3
confluent connect.

My setup:
There are 8 workers distributed across 8 ubuntu JVM instances. There are 7
connectors configured with each connector configured with 8 tasks
(Averaging about 4 tasks per connector). Each connector consumes messages
from an unique kafka topic and writes these messages to an s3 bucket using
the HourlyParitioner class.

We recently lost two of our workers and were replaced with two new workers,
which were assigned the same group.id.
However after bringing up the new worker, all connectors started responding
with a HTTP 409 error code stating that the workers are rebalancing. I am
unable to update the configurations of my connectors.

The system has been continuously trying to rebalance for over 12 hours to
no awhile.

My questions:
1.) How do I debug my current system state? Can I look at some logs to see
what's happening and how can I fix this issue.

I have read through these resources: https://docs.
confluent.io/current/connect/design.html and https://docs.
confluent.io/current/connect/concepts.html. However, I still do not
understand how to debug or understand the core problem in my specific case.
I do realize there seems to be a rebalancing problem occuring when I bring
up a new worker or add a new task to a connector, but what I do not
understand is how do I stop this rebalancing or force a rebalance or reset
of the system to the new setup.

*Current commands I use for debugging my setup:*
curl -XGET localhost:8083/connectors/<connector>/config
curl -XGET localhost:8083/connectors/<connector>/status


Appended the following into my connect-log4j.properties file:
======
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c:%L)%n
========

Is there anything else I could do to give me more visibility into my system?

Thanks

Re: Kafka connect rebalance problem

Posted by Dhawan Gajendran <dh...@datavisor.com>.
Thank you for the reply.

I have 7 kafka s3 connectors. Each connector has been configured with 4
tasks. However, I am trying to add more tasks  for a few s3 connectors.
When I try to change the configuration of these connectors using the rest
API, I see the worker rebalance error thrown from the REST endpoint.

I am using confluent oss 2.11 version: S3 connect version of 3.3.0, kafka v
11.0.1, and jdk 1.8.

On Oct 17, 2017 1:37 AM, "Ted Yu" <yu...@gmail.com> wrote:

> bq. There are 7 connectors configured with each connector configured with 8
> tasks (Averaging about 4 tasks per connector)
>
> Pardon. I don't quite understand the above setup. Can you describe in more
> detail ?
>
> Which version of connector are you using ?
>
> Cheers
>
> On Mon, Oct 16, 2017 at 11:30 PM, Dhawan Gajendran <
> dhawan.gajendran@datavisor.com> wrote:
>
> > Hi All,
> >
> > We had done a POC with kafka s3 connect confluent platform and have been
> > trying to move the setup to production. In the current setup we have 7
> > kafka topics and each topic drains the messages to a s3 bucket using s3
> > confluent connect.
> >
> > My setup:
> > There are 8 workers distributed across 8 ubuntu JVM instances. There are
> 7
> > connectors configured with each connector configured with 8 tasks
> > (Averaging about 4 tasks per connector). Each connector consumes messages
> > from an unique kafka topic and writes these messages to an s3 bucket
> using
> > the HourlyParitioner class.
> >
> > We recently lost two of our workers and were replaced with two new
> workers,
> > which were assigned the same group.id.
> > However after bringing up the new worker, all connectors started
> responding
> > with a HTTP 409 error code stating that the workers are rebalancing. I am
> > unable to update the configurations of my connectors.
> >
> > The system has been continuously trying to rebalance for over 12 hours to
> > no awhile.
> >
> > My questions:
> > 1.) How do I debug my current system state? Can I look at some logs to
> see
> > what's happening and how can I fix this issue.
> >
> > I have read through these resources: https://docs.
> > confluent.io/current/connect/design.html and https://docs.
> > confluent.io/current/connect/concepts.html. However, I still do not
> > understand how to debug or understand the core problem in my specific
> case.
> > I do realize there seems to be a rebalancing problem occuring when I
> bring
> > up a new worker or add a new task to a connector, but what I do not
> > understand is how do I stop this rebalancing or force a rebalance or
> reset
> > of the system to the new setup.
> >
> > *Current commands I use for debugging my setup:*
> > curl -XGET localhost:8083/connectors/<connector>/config
> > curl -XGET localhost:8083/connectors/<connector>/status
> >
> >
> > Appended the following into my connect-log4j.properties file:
> > ======
> > log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> > log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> > log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c:%L)%n
> > ========
> >
> > Is there anything else I could do to give me more visibility into my
> > system?
> >
> > Thanks
> >
>

Re: Kafka connect rebalance problem

Posted by Ted Yu <yu...@gmail.com>.
bq. There are 7 connectors configured with each connector configured with 8
tasks (Averaging about 4 tasks per connector)

Pardon. I don't quite understand the above setup. Can you describe in more
detail ?

Which version of connector are you using ?

Cheers

On Mon, Oct 16, 2017 at 11:30 PM, Dhawan Gajendran <
dhawan.gajendran@datavisor.com> wrote:

> Hi All,
>
> We had done a POC with kafka s3 connect confluent platform and have been
> trying to move the setup to production. In the current setup we have 7
> kafka topics and each topic drains the messages to a s3 bucket using s3
> confluent connect.
>
> My setup:
> There are 8 workers distributed across 8 ubuntu JVM instances. There are 7
> connectors configured with each connector configured with 8 tasks
> (Averaging about 4 tasks per connector). Each connector consumes messages
> from an unique kafka topic and writes these messages to an s3 bucket using
> the HourlyParitioner class.
>
> We recently lost two of our workers and were replaced with two new workers,
> which were assigned the same group.id.
> However after bringing up the new worker, all connectors started responding
> with a HTTP 409 error code stating that the workers are rebalancing. I am
> unable to update the configurations of my connectors.
>
> The system has been continuously trying to rebalance for over 12 hours to
> no awhile.
>
> My questions:
> 1.) How do I debug my current system state? Can I look at some logs to see
> what's happening and how can I fix this issue.
>
> I have read through these resources: https://docs.
> confluent.io/current/connect/design.html and https://docs.
> confluent.io/current/connect/concepts.html. However, I still do not
> understand how to debug or understand the core problem in my specific case.
> I do realize there seems to be a rebalancing problem occuring when I bring
> up a new worker or add a new task to a connector, but what I do not
> understand is how do I stop this rebalancing or force a rebalance or reset
> of the system to the new setup.
>
> *Current commands I use for debugging my setup:*
> curl -XGET localhost:8083/connectors/<connector>/config
> curl -XGET localhost:8083/connectors/<connector>/status
>
>
> Appended the following into my connect-log4j.properties file:
> ======
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c:%L)%n
> ========
>
> Is there anything else I could do to give me more visibility into my
> system?
>
> Thanks
>