You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Joe San <co...@gmail.com> on 2017/03/14 17:09:25 UTC

Kafka Retention Policy to Indefinite

Dear Kafka Users,

What are the arguments against setting the retention plociy on a Kafka
topic to infinite? I was in an interesting discussion with one of my
colleagues where he was suggesting to set the retention policy for a topic
to be indefinite.

So how does this play up when adding new broker partitions? Say, I have
accumulated in my topic some gigabytes of data and now I realize that I
have to scale up by adding another partition. Now is this going to pose me
a problem? The partition rebalance has to happen and I'm not sure what the
implications are with rebalancing a partition that has gigabytes of data.

Any thoughts on this?

Thanks and Regards,
Jothi

Re: Kafka Retention Policy to Indefinite

Posted by Joe San <co...@gmail.com>.
>
> I am saying that replication quotas will mitigate one of the potential
> downsides of setting an infinite retention policy.


I was just interested in all of the possible potential downsides! Could you
please point me to a documentation that has more information on this?

On Tue, Mar 14, 2017 at 7:07 PM, Hans Jespersen <ha...@confluent.io> wrote:

> I am saying that replication quotas will mitigate one of the potential
> downsides of setting an infinite retention policy.
>
> There is no clear set yes/no best practice rule for setting an extremely
> large retention policy. It is clearly a valid configuration and there are
> people who run this way.
>
> The issues have more to do will the amount of data you expect to be stored
> over the life of the system. If you have a Kafka cluster with petabytes of
> data in it and a consumer comes along and blindly consumes from the
> beginning, they will be getting a lot of data. So much so that this might
> be considered an anti-pattern because their apps might not behave as they
> expect and the network bandwidth used by lots of clients operating this way
> may be considered bad practice.
>
> Another way to avoid collecting too much data is to use compacted topics,
> which are a special kind of topic that keeps the latest value for each key
> forever, but removes the older messages with the same key in order to
> reduce the total about of messages stored.
>
> How much data do you expect to store in your largest topic over the life of
> the cluster?
>
> -hans
>
>
>
>
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * hans@confluent.io (650)924-2670
>  */
>
> On Tue, Mar 14, 2017 at 10:36 AM, Joe San <co...@gmail.com> wrote:
>
> > So that means with replication quotas, I can set the retention policy to
> be
> > infinite?
> >
> > On Tue, Mar 14, 2017 at 6:25 PM, Hans Jespersen <ha...@confluent.io>
> wrote:
> >
> > > You might want to use the new replication quotas mechanism (i.e.
> network
> > > throttling) to make sure that replication traffic doesn't negatively
> > impact
> > > your production traffic.
> > >
> > > See for details:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 73+Replication+Quotas
> > >
> > > This feature was added in 0.10.1
> > >
> > > -hans
> > >
> > > /**
> > >  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> > >  * hans@confluent.io (650)924-2670
> > >  */
> > >
> > > On Tue, Mar 14, 2017 at 10:09 AM, Joe San <co...@gmail.com>
> > wrote:
> > >
> > > > Dear Kafka Users,
> > > >
> > > > What are the arguments against setting the retention plociy on a
> Kafka
> > > > topic to infinite? I was in an interesting discussion with one of my
> > > > colleagues where he was suggesting to set the retention policy for a
> > > topic
> > > > to be indefinite.
> > > >
> > > > So how does this play up when adding new broker partitions? Say, I
> have
> > > > accumulated in my topic some gigabytes of data and now I realize
> that I
> > > > have to scale up by adding another partition. Now is this going to
> pose
> > > me
> > > > a problem? The partition rebalance has to happen and I'm not sure
> what
> > > the
> > > > implications are with rebalancing a partition that has gigabytes of
> > data.
> > > >
> > > > Any thoughts on this?
> > > >
> > > > Thanks and Regards,
> > > > Jothi
> > > >
> > >
> >
>

Re: Kafka Retention Policy to Indefinite

Posted by Hans Jespersen <ha...@confluent.io>.
I am saying that replication quotas will mitigate one of the potential
downsides of setting an infinite retention policy.

There is no clear set yes/no best practice rule for setting an extremely
large retention policy. It is clearly a valid configuration and there are
people who run this way.

The issues have more to do will the amount of data you expect to be stored
over the life of the system. If you have a Kafka cluster with petabytes of
data in it and a consumer comes along and blindly consumes from the
beginning, they will be getting a lot of data. So much so that this might
be considered an anti-pattern because their apps might not behave as they
expect and the network bandwidth used by lots of clients operating this way
may be considered bad practice.

Another way to avoid collecting too much data is to use compacted topics,
which are a special kind of topic that keeps the latest value for each key
forever, but removes the older messages with the same key in order to
reduce the total about of messages stored.

How much data do you expect to store in your largest topic over the life of
the cluster?

-hans





/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * hans@confluent.io (650)924-2670
 */

On Tue, Mar 14, 2017 at 10:36 AM, Joe San <co...@gmail.com> wrote:

> So that means with replication quotas, I can set the retention policy to be
> infinite?
>
> On Tue, Mar 14, 2017 at 6:25 PM, Hans Jespersen <ha...@confluent.io> wrote:
>
> > You might want to use the new replication quotas mechanism (i.e. network
> > throttling) to make sure that replication traffic doesn't negatively
> impact
> > your production traffic.
> >
> > See for details:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 73+Replication+Quotas
> >
> > This feature was added in 0.10.1
> >
> > -hans
> >
> > /**
> >  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> >  * hans@confluent.io (650)924-2670
> >  */
> >
> > On Tue, Mar 14, 2017 at 10:09 AM, Joe San <co...@gmail.com>
> wrote:
> >
> > > Dear Kafka Users,
> > >
> > > What are the arguments against setting the retention plociy on a Kafka
> > > topic to infinite? I was in an interesting discussion with one of my
> > > colleagues where he was suggesting to set the retention policy for a
> > topic
> > > to be indefinite.
> > >
> > > So how does this play up when adding new broker partitions? Say, I have
> > > accumulated in my topic some gigabytes of data and now I realize that I
> > > have to scale up by adding another partition. Now is this going to pose
> > me
> > > a problem? The partition rebalance has to happen and I'm not sure what
> > the
> > > implications are with rebalancing a partition that has gigabytes of
> data.
> > >
> > > Any thoughts on this?
> > >
> > > Thanks and Regards,
> > > Jothi
> > >
> >
>

Re: Kafka Retention Policy to Indefinite

Posted by Joe San <co...@gmail.com>.
So that means with replication quotas, I can set the retention policy to be
infinite?

On Tue, Mar 14, 2017 at 6:25 PM, Hans Jespersen <ha...@confluent.io> wrote:

> You might want to use the new replication quotas mechanism (i.e. network
> throttling) to make sure that replication traffic doesn't negatively impact
> your production traffic.
>
> See for details:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 73+Replication+Quotas
>
> This feature was added in 0.10.1
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * hans@confluent.io (650)924-2670
>  */
>
> On Tue, Mar 14, 2017 at 10:09 AM, Joe San <co...@gmail.com> wrote:
>
> > Dear Kafka Users,
> >
> > What are the arguments against setting the retention plociy on a Kafka
> > topic to infinite? I was in an interesting discussion with one of my
> > colleagues where he was suggesting to set the retention policy for a
> topic
> > to be indefinite.
> >
> > So how does this play up when adding new broker partitions? Say, I have
> > accumulated in my topic some gigabytes of data and now I realize that I
> > have to scale up by adding another partition. Now is this going to pose
> me
> > a problem? The partition rebalance has to happen and I'm not sure what
> the
> > implications are with rebalancing a partition that has gigabytes of data.
> >
> > Any thoughts on this?
> >
> > Thanks and Regards,
> > Jothi
> >
>

Re: Kafka Retention Policy to Indefinite

Posted by Hans Jespersen <ha...@confluent.io>.
You might want to use the new replication quotas mechanism (i.e. network
throttling) to make sure that replication traffic doesn't negatively impact
your production traffic.

See for details:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas

This feature was added in 0.10.1

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * hans@confluent.io (650)924-2670
 */

On Tue, Mar 14, 2017 at 10:09 AM, Joe San <co...@gmail.com> wrote:

> Dear Kafka Users,
>
> What are the arguments against setting the retention plociy on a Kafka
> topic to infinite? I was in an interesting discussion with one of my
> colleagues where he was suggesting to set the retention policy for a topic
> to be indefinite.
>
> So how does this play up when adding new broker partitions? Say, I have
> accumulated in my topic some gigabytes of data and now I realize that I
> have to scale up by adding another partition. Now is this going to pose me
> a problem? The partition rebalance has to happen and I'm not sure what the
> implications are with rebalancing a partition that has gigabytes of data.
>
> Any thoughts on this?
>
> Thanks and Regards,
> Jothi
>