You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Stephane Maarek <st...@simplemachines.com.au> on 2017/01/05 23:12:30 UTC

One big kafka connect cluster or many small ones?

Hi,

We like to operate in micro-services (dockerize and ship everything on ecs)
and I was wondering which approach was preferred.
We have one kafka cluster, one zookeeper cluster, etc, but when it comes to
kafka connect I have some doubts.

Is it better to have one big kafka connect with multiple nodes, or many
small kafka connect clusters or standalone, for each connector / etl ?

The issues I’m trying to address are :
 - Integration with our CI/CD pipeline
 - Efficient resources utilisation
 - Easily add new jar files that connectors depend on with minimal downtime
 - Monitoring operations

Thanks for your guidance

Regards,
Stephane

Re: One big kafka connect cluster or many small ones?

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

Yeah, you'd set the key.converter and/or value.converter in your connector
config.

-Ewen

On Thu, Jan 5, 2017 at 9:50 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Thanks!
> So I just override the conf while doing the API call? It’d be great to
> have this documented somewhere on the confluent website. I couldn’t find
> it.
>
> On 6 January 2017 at 3:42:45 pm, Ewen Cheslack-Postava (ewen@confluent.io)
> wrote:
>
> On Thu, Jan 5, 2017 at 7:19 PM, Stephane Maarek <
> stephane@simplemachines.com.au> wrote:
>
>> Thanks a lot for the guidance, I think we’ll go ahead with one cluster. I
>> just need to figure out how our CD pipeline can talk to our connect cluster
>> securely (because it’ll need direct access to perform API calls).
>>
>
> The documentation isn't great here, but you can apply all the normal
> security configs to Connect (in distributed mode, it's basically equivalent
> to a consumer, so everything you can do with a consumer you can do with
> Connect).
>
>
>>
>> Lastly, a question or maybe a piece of feedback… is it not possible to
>> specify the key serializer and deserializer as part of the rest api job
>> config?
>> The issue is that sometimes our data is avro, sometimes it’s json. And it
>> seems I’d need two separate clusters for that?
>>
>
> This is new! As of 0.10.1.0, we have https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-75+-+Add+per-connector+Converters which
> allows you to include it in the connector config. It's called "Converter"
> in Connect because it does a bit more than ser/des if you've written them
> for Kafka, but they are basically just pluggable ser/des. We knew folks
> would want this, it just took us awhile to find the bandwidth to implement
> it. Now, you shouldn't need to do anything special or deploy multiple
> clusters -- it's baked in and supported as long as you are willing to
> override it on a per-connector basis (and this seems reasonable for most
> folks since *ideally* you are *somewhat* standardized on a common
> serialization format).
>
> -Ewen
>
>
>>
>> On 6 January 2017 at 1:54:10 pm, Ewen Cheslack-Postava (ewen@confluent.io)
>> wrote:
>>
>> On Thu, Jan 5, 2017 at 3:12 PM, Stephane Maarek <
>> stephane@simplemachines.com.au> wrote:
>>
>> > Hi,
>> >
>> > We like to operate in micro-services (dockerize and ship everything on
>> ecs)
>> > and I was wondering which approach was preferred.
>> > We have one kafka cluster, one zookeeper cluster, etc, but when it
>> comes to
>> > kafka connect I have some doubts.
>> >
>> > Is it better to have one big kafka connect with multiple nodes, or many
>> > small kafka connect clusters or standalone, for each connector / etl ?
>> >
>>
>> You can do any of these, and it may depend on how you do
>> orchestration/deployment.
>>
>> We built Connect to support running one big cluster running a bunch of
>> connectors. It balances work automatically and provides a way to control
>> scale up/down via increased parallelism. This means we don't need to make
>> any assumptions about how you deploy, how you handle elastically scaling
>> your clusters, etc. But if you run in an environment and have the tooling
>> in place to do that already, you can also opt to run many smaller clusters
>> and use that tooling to scale up/down. In that case you'd just make sure
>> there were enough tasks for each connector so that when you scale the # of
>> workers for a cluster up the rebalancing of work would ensure there was
>> enough tasks for every worker to remain occupied.
>>
>> The main drawback of doing this is that Connect uses a few topics to for
>> configs, status, and offsets and you need these to be unique per cluster.
>> This means you'll have 3N more topics. If you're running a *lot* of
>> connectors, that could eventually become a problem. It also means you have
>> that many more worker configs to handle, clusters to monitor, etc. And
>> deploying a connector no longer becomes as simple as just making a call to
>> the service's REST API since there isn't a single centralized service. The
>> main benefits I can think of are a) if you already have preferred tooling
>> for handling elasticity and b) better resource isolation between
>> connectors
>> (i.e. an OOM error in one connector won't affect any other connectors).
>>
>> For standalone mode, we'd generally recommend only using it when
>> distributed mode doesn't make sense, e.g. for log file collection. Other
>> than that, having the fault tolerance and high availability of distributed
>> mode is preferred.
>>
>> On your specific points:
>>
>> >
>> > The issues I’m trying to address are :
>> > - Integration with our CI/CD pipeline
>> >
>>
>> I'm not sure anything about Connect affects this. Is there a specific
>> concern you have about the CI/CD pipeline & Connect?
>>
>>
>> > - Efficient resources utilisation
>> >
>>
>> Putting all the connectors into one cluster will probably result in better
>> resource utilization unless you're already automatically tracking usage
>> and
>> scaling appropriately. The reason is that if you use a bunch of small
>> clusters, you're now stuck trying to optimize N uses. Since Connect can
>> already (roughly) balance work, putting all the work into one cluster and
>> having connect split it up means you just need to watch utilization of the
>> nodes in that one cluster and scale up or down as appropriate.
>>
>>
>> > - Easily add new jar files that connectors depend on with minimal
>> downtime
>> >
>>
>> This one is a bit interesting. You shouldn't have any downtime adding jars
>> in the sense that you can do rolling bounces of Connect. The one caveat is
>> that the current limitation for how it rebalances work involves halting
>> work for all connectors/tasks, doing the rebalance, and then starting them
>> up again. We plan to improve this, but the timeframe for it is still
>> uncertain. Usually these rebalance steps should be pretty quick. The main
>> reason this can be a concern is that halting some connectors could take
>> some time (e.g. because they need to fully flush their data). This means
>> the period of time your connectors are not processing data during one of
>> those rebalances is controlled by the "worst" connector.
>>
>> I would recommend trying a single cluster but monitoring whether you see
>> stalls due to rebalances. If you do, then moving to multiple clusters
>> might
>> make sense. (This also, obviously, depends a lot on your SLA for data
>> delivery.)
>>
>>
>> > - Monitoring operations
>> >
>>
>> Multiple clusters definitely seems messier and more complicated for this.
>> There will be more workers in a single cluster, but it's a single service
>> you need to monitor and maintain.
>>
>> Hope that helps!
>>
>> -Ewen
>>
>>
>> >
>> > Thanks for your guidance
>> >
>> > Regards,
>> > Stephane
>> >
>>
>>
>

Re: One big kafka connect cluster or many small ones?

Posted by Stephane Maarek <st...@simplemachines.com.au>.

Thanks!
So I just override the conf while doing the API call? It’d be great to have
this documented somewhere on the confluent website. I couldn’t find it.

On 6 January 2017 at 3:42:45 pm, Ewen Cheslack-Postava (ewen@confluent.io)
wrote:

On Thu, Jan 5, 2017 at 7:19 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Thanks a lot for the guidance, I think we’ll go ahead with one cluster. I
> just need to figure out how our CD pipeline can talk to our connect cluster
> securely (because it’ll need direct access to perform API calls).
>

The documentation isn't great here, but you can apply all the normal
security configs to Connect (in distributed mode, it's basically equivalent
to a consumer, so everything you can do with a consumer you can do with
Connect).


>
> Lastly, a question or maybe a piece of feedback… is it not possible to
> specify the key serializer and deserializer as part of the rest api job
> config?
> The issue is that sometimes our data is avro, sometimes it’s json. And it
> seems I’d need two separate clusters for that?
>

This is new! As of 0.10.1.0, we have
https://cwiki.apache.org/confluence/display/KAFKA/KIP-75+-+Add+per-connector+Converters
which allows you to include it in the connector config. It's called
"Converter" in Connect because it does a bit more than ser/des if you've
written them for Kafka, but they are basically just pluggable ser/des. We
knew folks would want this, it just took us awhile to find the bandwidth to
implement it. Now, you shouldn't need to do anything special or deploy
multiple clusters -- it's baked in and supported as long as you are willing
to override it on a per-connector basis (and this seems reasonable for most
folks since *ideally* you are *somewhat* standardized on a common
serialization format).

-Ewen


>
> On 6 January 2017 at 1:54:10 pm, Ewen Cheslack-Postava (ewen@confluent.io)
> wrote:
>
> On Thu, Jan 5, 2017 at 3:12 PM, Stephane Maarek <
> stephane@simplemachines.com.au> wrote:
>
> > Hi,
> >
> > We like to operate in micro-services (dockerize and ship everything on
> ecs)
> > and I was wondering which approach was preferred.
> > We have one kafka cluster, one zookeeper cluster, etc, but when it comes
> to
> > kafka connect I have some doubts.
> >
> > Is it better to have one big kafka connect with multiple nodes, or many
> > small kafka connect clusters or standalone, for each connector / etl ?
> >
>
> You can do any of these, and it may depend on how you do
> orchestration/deployment.
>
> We built Connect to support running one big cluster running a bunch of
> connectors. It balances work automatically and provides a way to control
> scale up/down via increased parallelism. This means we don't need to make
> any assumptions about how you deploy, how you handle elastically scaling
> your clusters, etc. But if you run in an environment and have the tooling
> in place to do that already, you can also opt to run many smaller clusters
> and use that tooling to scale up/down. In that case you'd just make sure
> there were enough tasks for each connector so that when you scale the # of
> workers for a cluster up the rebalancing of work would ensure there was
> enough tasks for every worker to remain occupied.
>
> The main drawback of doing this is that Connect uses a few topics to for
> configs, status, and offsets and you need these to be unique per cluster.
> This means you'll have 3N more topics. If you're running a *lot* of
> connectors, that could eventually become a problem. It also means you have
> that many more worker configs to handle, clusters to monitor, etc. And
> deploying a connector no longer becomes as simple as just making a call to
> the service's REST API since there isn't a single centralized service. The
> main benefits I can think of are a) if you already have preferred tooling
> for handling elasticity and b) better resource isolation between connectors
> (i.e. an OOM error in one connector won't affect any other connectors).
>
> For standalone mode, we'd generally recommend only using it when
> distributed mode doesn't make sense, e.g. for log file collection. Other
> than that, having the fault tolerance and high availability of distributed
> mode is preferred.
>
> On your specific points:
>
> >
> > The issues I’m trying to address are :
> > - Integration with our CI/CD pipeline
> >
>
> I'm not sure anything about Connect affects this. Is there a specific
> concern you have about the CI/CD pipeline & Connect?
>
>
> > - Efficient resources utilisation
> >
>
> Putting all the connectors into one cluster will probably result in better
> resource utilization unless you're already automatically tracking usage and
> scaling appropriately. The reason is that if you use a bunch of small
> clusters, you're now stuck trying to optimize N uses. Since Connect can
> already (roughly) balance work, putting all the work into one cluster and
> having connect split it up means you just need to watch utilization of the
> nodes in that one cluster and scale up or down as appropriate.
>
>
> > - Easily add new jar files that connectors depend on with minimal
> downtime
> >
>
> This one is a bit interesting. You shouldn't have any downtime adding jars
> in the sense that you can do rolling bounces of Connect. The one caveat is
> that the current limitation for how it rebalances work involves halting
> work for all connectors/tasks, doing the rebalance, and then starting them
> up again. We plan to improve this, but the timeframe for it is still
> uncertain. Usually these rebalance steps should be pretty quick. The main
> reason this can be a concern is that halting some connectors could take
> some time (e.g. because they need to fully flush their data). This means
> the period of time your connectors are not processing data during one of
> those rebalances is controlled by the "worst" connector.
>
> I would recommend trying a single cluster but monitoring whether you see
> stalls due to rebalances. If you do, then moving to multiple clusters might
> make sense. (This also, obviously, depends a lot on your SLA for data
> delivery.)
>
>
> > - Monitoring operations
> >
>
> Multiple clusters definitely seems messier and more complicated for this.
> There will be more workers in a single cluster, but it's a single service
> you need to monitor and maintain.
>
> Hope that helps!
>
> -Ewen
>
>
> >
> > Thanks for your guidance
> >
> > Regards,
> > Stephane
> >
>
>

Re: One big kafka connect cluster or many small ones?

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

On Thu, Jan 5, 2017 at 7:19 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Thanks a lot for the guidance, I think we’ll go ahead with one cluster. I
> just need to figure out how our CD pipeline can talk to our connect cluster
> securely (because it’ll need direct access to perform API calls).
>

The documentation isn't great here, but you can apply all the normal
security configs to Connect (in distributed mode, it's basically equivalent
to a consumer, so everything you can do with a consumer you can do with
Connect).


>
> Lastly, a question or maybe a piece of feedback… is it not possible to
> specify the key serializer and deserializer as part of the rest api job
> config?
> The issue is that sometimes our data is avro, sometimes it’s json. And it
> seems I’d need two separate clusters for that?
>

This is new! As of 0.10.1.0, we have
https://cwiki.apache.org/confluence/display/KAFKA/KIP-75+-+Add+per-connector+Converters
which allows you to include it in the connector config. It's called
"Converter" in Connect because it does a bit more than ser/des if you've
written them for Kafka, but they are basically just pluggable ser/des. We
knew folks would want this, it just took us awhile to find the bandwidth to
implement it. Now, you shouldn't need to do anything special or deploy
multiple clusters -- it's baked in and supported as long as you are willing
to override it on a per-connector basis (and this seems reasonable for most
folks since *ideally* you are *somewhat* standardized on a common
serialization format).

-Ewen


>
> On 6 January 2017 at 1:54:10 pm, Ewen Cheslack-Postava (ewen@confluent.io)
> wrote:
>
> On Thu, Jan 5, 2017 at 3:12 PM, Stephane Maarek <
> stephane@simplemachines.com.au> wrote:
>
> > Hi,
> >
> > We like to operate in micro-services (dockerize and ship everything on
> ecs)
> > and I was wondering which approach was preferred.
> > We have one kafka cluster, one zookeeper cluster, etc, but when it comes
> to
> > kafka connect I have some doubts.
> >
> > Is it better to have one big kafka connect with multiple nodes, or many
> > small kafka connect clusters or standalone, for each connector / etl ?
> >
>
> You can do any of these, and it may depend on how you do
> orchestration/deployment.
>
> We built Connect to support running one big cluster running a bunch of
> connectors. It balances work automatically and provides a way to control
> scale up/down via increased parallelism. This means we don't need to make
> any assumptions about how you deploy, how you handle elastically scaling
> your clusters, etc. But if you run in an environment and have the tooling
> in place to do that already, you can also opt to run many smaller clusters
> and use that tooling to scale up/down. In that case you'd just make sure
> there were enough tasks for each connector so that when you scale the # of
> workers for a cluster up the rebalancing of work would ensure there was
> enough tasks for every worker to remain occupied.
>
> The main drawback of doing this is that Connect uses a few topics to for
> configs, status, and offsets and you need these to be unique per cluster.
> This means you'll have 3N more topics. If you're running a *lot* of
> connectors, that could eventually become a problem. It also means you have
> that many more worker configs to handle, clusters to monitor, etc. And
> deploying a connector no longer becomes as simple as just making a call to
> the service's REST API since there isn't a single centralized service. The
> main benefits I can think of are a) if you already have preferred tooling
> for handling elasticity and b) better resource isolation between
> connectors
> (i.e. an OOM error in one connector won't affect any other connectors).
>
> For standalone mode, we'd generally recommend only using it when
> distributed mode doesn't make sense, e.g. for log file collection. Other
> than that, having the fault tolerance and high availability of distributed
> mode is preferred.
>
> On your specific points:
>
> >
> > The issues I’m trying to address are :
> > - Integration with our CI/CD pipeline
> >
>
> I'm not sure anything about Connect affects this. Is there a specific
> concern you have about the CI/CD pipeline & Connect?
>
>
> > - Efficient resources utilisation
> >
>
> Putting all the connectors into one cluster will probably result in better
> resource utilization unless you're already automatically tracking usage
> and
> scaling appropriately. The reason is that if you use a bunch of small
> clusters, you're now stuck trying to optimize N uses. Since Connect can
> already (roughly) balance work, putting all the work into one cluster and
> having connect split it up means you just need to watch utilization of the
> nodes in that one cluster and scale up or down as appropriate.
>
>
> > - Easily add new jar files that connectors depend on with minimal
> downtime
> >
>
> This one is a bit interesting. You shouldn't have any downtime adding jars
> in the sense that you can do rolling bounces of Connect. The one caveat is
> that the current limitation for how it rebalances work involves halting
> work for all connectors/tasks, doing the rebalance, and then starting them
> up again. We plan to improve this, but the timeframe for it is still
> uncertain. Usually these rebalance steps should be pretty quick. The main
> reason this can be a concern is that halting some connectors could take
> some time (e.g. because they need to fully flush their data). This means
> the period of time your connectors are not processing data during one of
> those rebalances is controlled by the "worst" connector.
>
> I would recommend trying a single cluster but monitoring whether you see
> stalls due to rebalances. If you do, then moving to multiple clusters
> might
> make sense. (This also, obviously, depends a lot on your SLA for data
> delivery.)
>
>
> > - Monitoring operations
> >
>
> Multiple clusters definitely seems messier and more complicated for this.
> There will be more workers in a single cluster, but it's a single service
> you need to monitor and maintain.
>
> Hope that helps!
>
> -Ewen
>
>
> >
> > Thanks for your guidance
> >
> > Regards,
> > Stephane
> >
>
>

Re: One big kafka connect cluster or many small ones?

Posted by Stephane Maarek <st...@simplemachines.com.au>.

Thanks a lot for the guidance, I think we’ll go ahead with one cluster. I
just need to figure out how our CD pipeline can talk to our connect cluster
securely (because it’ll need direct access to perform API calls).

Lastly, a question or maybe a piece of feedback… is it not possible to
specify the key serializer and deserializer as part of the rest api job
config?
The issue is that sometimes our data is avro, sometimes it’s json. And it
seems I’d need two separate clusters for that?

On 6 January 2017 at 1:54:10 pm, Ewen Cheslack-Postava (ewen@confluent.io)
wrote:

On Thu, Jan 5, 2017 at 3:12 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Hi,
>
> We like to operate in micro-services (dockerize and ship everything on
ecs)
> and I was wondering which approach was preferred.
> We have one kafka cluster, one zookeeper cluster, etc, but when it comes
to
> kafka connect I have some doubts.
>
> Is it better to have one big kafka connect with multiple nodes, or many
> small kafka connect clusters or standalone, for each connector / etl ?
>

You can do any of these, and it may depend on how you do
orchestration/deployment.

We built Connect to support running one big cluster running a bunch of
connectors. It balances work automatically and provides a way to control
scale up/down via increased parallelism. This means we don't need to make
any assumptions about how you deploy, how you handle elastically scaling
your clusters, etc. But if you run in an environment and have the tooling
in place to do that already, you can also opt to run many smaller clusters
and use that tooling to scale up/down. In that case you'd just make sure
there were enough tasks for each connector so that when you scale the # of
workers for a cluster up the rebalancing of work would ensure there was
enough tasks for every worker to remain occupied.

The main drawback of doing this is that Connect uses a few topics to for
configs, status, and offsets and you need these to be unique per cluster.
This means you'll have 3N more topics. If you're running a *lot* of
connectors, that could eventually become a problem. It also means you have
that many more worker configs to handle, clusters to monitor, etc. And
deploying a connector no longer becomes as simple as just making a call to
the service's REST API since there isn't a single centralized service. The
main benefits I can think of are a) if you already have preferred tooling
for handling elasticity and b) better resource isolation between connectors
(i.e. an OOM error in one connector won't affect any other connectors).

For standalone mode, we'd generally recommend only using it when
distributed mode doesn't make sense, e.g. for log file collection. Other
than that, having the fault tolerance and high availability of distributed
mode is preferred.

On your specific points:

>
> The issues I’m trying to address are :
> - Integration with our CI/CD pipeline
>

I'm not sure anything about Connect affects this. Is there a specific
concern you have about the CI/CD pipeline & Connect?

> - Efficient resources utilisation
>

Putting all the connectors into one cluster will probably result in better
resource utilization unless you're already automatically tracking usage and
scaling appropriately. The reason is that if you use a bunch of small
clusters, you're now stuck trying to optimize N uses. Since Connect can
already (roughly) balance work, putting all the work into one cluster and
having connect split it up means you just need to watch utilization of the
nodes in that one cluster and scale up or down as appropriate.

> - Easily add new jar files that connectors depend on with minimal
downtime
>

This one is a bit interesting. You shouldn't have any downtime adding jars
in the sense that you can do rolling bounces of Connect. The one caveat is
that the current limitation for how it rebalances work involves halting
work for all connectors/tasks, doing the rebalance, and then starting them
up again. We plan to improve this, but the timeframe for it is still
uncertain. Usually these rebalance steps should be pretty quick. The main
reason this can be a concern is that halting some connectors could take
some time (e.g. because they need to fully flush their data). This means
the period of time your connectors are not processing data during one of
those rebalances is controlled by the "worst" connector.

I would recommend trying a single cluster but monitoring whether you see
stalls due to rebalances. If you do, then moving to multiple clusters might
make sense. (This also, obviously, depends a lot on your SLA for data
delivery.)

> - Monitoring operations
>

Multiple clusters definitely seems messier and more complicated for this.
There will be more workers in a single cluster, but it's a single service
you need to monitor and maintain.

Hope that helps!

-Ewen

>
> Thanks for your guidance
>
> Regards,
> Stephane
>

Re: One big kafka connect cluster or many small ones?

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.

On Thu, Jan 5, 2017 at 3:12 PM, Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Hi,
>
> We like to operate in micro-services (dockerize and ship everything on ecs)
> and I was wondering which approach was preferred.
> We have one kafka cluster, one zookeeper cluster, etc, but when it comes to
> kafka connect I have some doubts.
>
> Is it better to have one big kafka connect with multiple nodes, or many
> small kafka connect clusters or standalone, for each connector / etl ?
>

You can do any of these, and it may depend on how you do
orchestration/deployment.

We built Connect to support running one big cluster running a bunch of
connectors. It balances work automatically and provides a way to control
scale up/down via increased parallelism. This means we don't need to make
any assumptions about how you deploy, how you handle elastically scaling
your clusters, etc. But if you run in an environment and have the tooling
in place to do that already, you can also opt to run many smaller clusters
and use that tooling to scale up/down. In that case you'd just make sure
there were enough tasks for each connector so that when you scale the # of
workers for a cluster up the rebalancing of work would ensure there was
enough tasks for every worker to remain occupied.

The main drawback of doing this is that Connect uses a few topics to for
configs, status, and offsets and you need these to be unique per cluster.
This means you'll have 3N more topics. If you're running a *lot* of
connectors, that could eventually become a problem. It also means you have
that many more worker configs to handle, clusters to monitor, etc. And
deploying a connector no longer becomes as simple as just making a call to
the service's REST API since there isn't a single centralized service. The
main benefits I can think of are a) if you already have preferred tooling
for handling elasticity and b) better resource isolation between connectors
(i.e. an OOM error in one connector won't affect any other connectors).

For standalone mode, we'd generally recommend only using it when
distributed mode doesn't make sense, e.g. for log file collection. Other
than that, having the fault tolerance and high availability of distributed
mode is preferred.

On your specific points:

>
> The issues I’m trying to address are :
>  - Integration with our CI/CD pipeline
>

I'm not sure anything about Connect affects this. Is there a specific
concern you have about the CI/CD pipeline & Connect?

>  - Efficient resources utilisation
>

Putting all the connectors into one cluster will probably result in better
resource utilization unless you're already automatically tracking usage and
scaling appropriately. The reason is that if you use a bunch of small
clusters, you're now stuck trying to optimize N uses. Since Connect can
already (roughly) balance work, putting all the work into one cluster and
having connect split it up means you just need to watch utilization of the
nodes in that one cluster and scale up or down as appropriate.

>  - Easily add new jar files that connectors depend on with minimal downtime
>

This one is a bit interesting. You shouldn't have any downtime adding jars
in the sense that you can do rolling bounces of Connect. The one caveat is
that the current limitation for how it rebalances work involves halting
work for all connectors/tasks, doing the rebalance, and then starting them
up again. We plan to improve this, but the timeframe for it is still
uncertain. Usually these rebalance steps should be pretty quick. The main
reason this can be a concern is that halting some connectors could take
some time (e.g. because they need to fully flush their data). This means
the period of time your connectors are not processing data during one of
those rebalances is controlled by the "worst" connector.

I would recommend trying a single cluster but monitoring whether you see
stalls due to rebalances. If you do, then moving to multiple clusters might
make sense. (This also, obviously, depends a lot on your SLA for data
delivery.)

>  - Monitoring operations
>

Multiple clusters definitely seems messier and more complicated for this.
There will be more workers in a single cluster, but it's a single service
you need to monitor and maintain.

Hope that helps!

-Ewen

>
> Thanks for your guidance
>
> Regards,
> Stephane
>