You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jeff Widman <je...@netskope.com> on 2017/05/11 23:07:44 UTC

Why do I need to specify replication factor when creating a topic?

When creating a new topic, why do I need to specify the replication factor
and number of partitions?

I'd rather than when omitted, Kafka defaults to the value set in
server.properties.

Was this an explicit design decision?

RE: Why do I need to specify replication factor when creating a topic?

Posted by Thomas Becker <to...@Tivo.com>.
I can only assume that it was. I have heard it stated as a goal that it should be easier to simply create topics with the "defaults" but we seem to be making only incremental progress towards that goal. I don't know if the assumption is that most users are routinely varying the number of partitions and replication factor on a per-topic basis so this is less of a burden or what. In our case it's virtually the opposite; we set defaults and deviating from them is the exception, not the rule.

________________________________________
From: Jeff Widman [jeff@netskope.com]
Sent: Friday, May 12, 2017 11:45 AM
To: users@kafka.apache.org
Subject: Re: Why do I need to specify replication factor when creating a topic?

> The problem is that the AdminUtils requires this info to be known client
side, but there is no API to get it.

Why does the client side need it? If the broker can auto-create topics,
then the broker is aware of the default param.

> I think things will be better in 0.11.0 where we have the AdminClient
that includes support for both topic CRUD APIs (not just ZK modifications
like AdminUtils does) and APIs to get configs. But as far as I'm aware it
will still be 2 calls (1 to get the default configs, another to create the
topics with those configs).

We're a python shop, so generally our interface is the Protocol APIs, not
the Java AdminClient. I've been looking forward to using the CRUD APIs for
a while. However, it sounds like the CRUD API's still require explicitly
including the replication factor param in the CreateTopic call.

That's essentially the crux of my question... why does the client ever need
to know the default param if the broker is already aware of it? Was this an
explicit design decision?







On Fri, May 12, 2017 at 6:11 AM, Thomas Becker <to...@tivo.com> wrote:

> Yes, this has been an issue for some time. The problem is that the
> AdminUtils requires this info to be known client side, but there is no API
> to get it. I think things will be better in 0.11.0 where we have the
> AdminClient that includes support for both topic CRUD APIs (not just ZK
> modifications like AdminUtils does) and APIs to get configs. But as far as
> I'm aware it will still be 2 calls (1 to get the default configs, another
> to create the topics with those configs).
>
> -Tommy
>
> ________________________________________
> From: Jeff Widman [jeff@netskope.com]
> Sent: Thursday, May 11, 2017 7:42 PM
> To: users@kafka.apache.org
> Subject: Re: Why do I need to specify replication factor when creating a
> topic?
>
> To further clarify:
> I'm trying to create topics programmatically.
>
> We want to run our code against dev/staging/production clusters. In dev,
> they are often single-broker clusters. In production, we default to
> replication factor of 3.
>
> So that's why it'd make life easier if it defaulted to the value in
> server.properties, rather than our code having to figure out whether it's a
> dev vs produciton cluster.
>
> I'm aware we could hack around this by relying on topic auto-creation, but
> we'd rather disable that to prevent topics being accidentally created.
>
> On Thu, May 11, 2017 at 4:07 PM, Jeff Widman <je...@netskope.com> wrote:
>
> > When creating a new topic, why do I need to specify the replication
> factor
> > and number of partitions?
> >
> > I'd rather than when omitted, Kafka defaults to the value set in
> > server.properties.
> >
> > Was this an explicit design decision?
> >
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Why do I need to specify replication factor when creating a topic?

Posted by Jeff Widman <je...@netskope.com>.
> The problem is that the AdminUtils requires this info to be known client
side, but there is no API to get it.

Why does the client side need it? If the broker can auto-create topics,
then the broker is aware of the default param.

> I think things will be better in 0.11.0 where we have the AdminClient
that includes support for both topic CRUD APIs (not just ZK modifications
like AdminUtils does) and APIs to get configs. But as far as I'm aware it
will still be 2 calls (1 to get the default configs, another to create the
topics with those configs).

We're a python shop, so generally our interface is the Protocol APIs, not
the Java AdminClient. I've been looking forward to using the CRUD APIs for
a while. However, it sounds like the CRUD API's still require explicitly
including the replication factor param in the CreateTopic call.

That's essentially the crux of my question... why does the client ever need
to know the default param if the broker is already aware of it? Was this an
explicit design decision?







On Fri, May 12, 2017 at 6:11 AM, Thomas Becker <to...@tivo.com> wrote:

> Yes, this has been an issue for some time. The problem is that the
> AdminUtils requires this info to be known client side, but there is no API
> to get it. I think things will be better in 0.11.0 where we have the
> AdminClient that includes support for both topic CRUD APIs (not just ZK
> modifications like AdminUtils does) and APIs to get configs. But as far as
> I'm aware it will still be 2 calls (1 to get the default configs, another
> to create the topics with those configs).
>
> -Tommy
>
> ________________________________________
> From: Jeff Widman [jeff@netskope.com]
> Sent: Thursday, May 11, 2017 7:42 PM
> To: users@kafka.apache.org
> Subject: Re: Why do I need to specify replication factor when creating a
> topic?
>
> To further clarify:
> I'm trying to create topics programmatically.
>
> We want to run our code against dev/staging/production clusters. In dev,
> they are often single-broker clusters. In production, we default to
> replication factor of 3.
>
> So that's why it'd make life easier if it defaulted to the value in
> server.properties, rather than our code having to figure out whether it's a
> dev vs produciton cluster.
>
> I'm aware we could hack around this by relying on topic auto-creation, but
> we'd rather disable that to prevent topics being accidentally created.
>
> On Thu, May 11, 2017 at 4:07 PM, Jeff Widman <je...@netskope.com> wrote:
>
> > When creating a new topic, why do I need to specify the replication
> factor
> > and number of partitions?
> >
> > I'd rather than when omitted, Kafka defaults to the value set in
> > server.properties.
> >
> > Was this an explicit design decision?
> >
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

RE: Why do I need to specify replication factor when creating a topic?

Posted by Thomas Becker <to...@Tivo.com>.
Yes, this has been an issue for some time. The problem is that the AdminUtils requires this info to be known client side, but there is no API to get it. I think things will be better in 0.11.0 where we have the AdminClient that includes support for both topic CRUD APIs (not just ZK modifications like AdminUtils does) and APIs to get configs. But as far as I'm aware it will still be 2 calls (1 to get the default configs, another to create the topics with those configs).

-Tommy

________________________________________
From: Jeff Widman [jeff@netskope.com]
Sent: Thursday, May 11, 2017 7:42 PM
To: users@kafka.apache.org
Subject: Re: Why do I need to specify replication factor when creating a topic?

To further clarify:
I'm trying to create topics programmatically.

We want to run our code against dev/staging/production clusters. In dev,
they are often single-broker clusters. In production, we default to
replication factor of 3.

So that's why it'd make life easier if it defaulted to the value in
server.properties, rather than our code having to figure out whether it's a
dev vs produciton cluster.

I'm aware we could hack around this by relying on topic auto-creation, but
we'd rather disable that to prevent topics being accidentally created.

On Thu, May 11, 2017 at 4:07 PM, Jeff Widman <je...@netskope.com> wrote:

> When creating a new topic, why do I need to specify the replication factor
> and number of partitions?
>
> I'd rather than when omitted, Kafka defaults to the value set in
> server.properties.
>
> Was this an explicit design decision?
>

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

Re: Why do I need to specify replication factor when creating a topic?

Posted by Andrew Psaltis <ps...@gmail.com>.
Jeff,
I'm not sure if this is an option for you. However, I have been faced with
a similar problem before and we handled it by putting all of the
information needed to connect and use the Kafka API's in a config file. In
our case we were using Typesafe config [1] for lots of configuration in our
services. It has quite a few nice features and we were able to change the
values based on the environment with Puppet and Chef, I'm not sure if you
are using these or other tools in the DevOps space.

Hope that helps. Ping me offline if you want to chat about it more.

Thanks,
Andrew

On Thu, May 11, 2017 at 19:43 Jeff Widman <je...@netskope.com> wrote:

> To further clarify:
> I'm trying to create topics programmatically.
>
> We want to run our code against dev/staging/production clusters. In dev,
> they are often single-broker clusters. In production, we default to
> replication factor of 3.
>
> So that's why it'd make life easier if it defaulted to the value in
> server.properties, rather than our code having to figure out whether it's a
> dev vs produciton cluster.
>
> I'm aware we could hack around this by relying on topic auto-creation, but
> we'd rather disable that to prevent topics being accidentally created.
>
> On Thu, May 11, 2017 at 4:07 PM, Jeff Widman <je...@netskope.com> wrote:
>
> > When creating a new topic, why do I need to specify the replication
> factor
> > and number of partitions?
> >
> > I'd rather than when omitted, Kafka defaults to the value set in
> > server.properties.
> >
> > Was this an explicit design decision?
> >
>

Re: Why do I need to specify replication factor when creating a topic?

Posted by Jeff Widman <je...@netskope.com>.
To further clarify:
I'm trying to create topics programmatically.

We want to run our code against dev/staging/production clusters. In dev,
they are often single-broker clusters. In production, we default to
replication factor of 3.

So that's why it'd make life easier if it defaulted to the value in
server.properties, rather than our code having to figure out whether it's a
dev vs produciton cluster.

I'm aware we could hack around this by relying on topic auto-creation, but
we'd rather disable that to prevent topics being accidentally created.

On Thu, May 11, 2017 at 4:07 PM, Jeff Widman <je...@netskope.com> wrote:

> When creating a new topic, why do I need to specify the replication factor
> and number of partitions?
>
> I'd rather than when omitted, Kafka defaults to the value set in
> server.properties.
>
> Was this an explicit design decision?
>

Re: Why do I need to specify replication factor when creating a topic?

Posted by Hans Jespersen <ha...@confluent.io>.
If you enable auto topic creation that that is exactly what will happen.

There are pros and cons to creating topics with defaults values but if you fell strongly that is the way that you want Kafka to work it is entire possible to setup the system to work that way.

-hans




> On May 11, 2017, at 4:07 PM, Jeff Widman <je...@netskope.com> wrote:
> 
> When creating a new topic, why do I need to specify the replication factor
> and number of partitions?
> 
> I'd rather than when omitted, Kafka defaults to the value set in
> server.properties.
> 
> Was this an explicit design decision?