You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Alexei Zenin <al...@mail.utoronto.ca> on 2017/10/14 14:48:48 UTC

Possible Feature: Topic Retention Policy

Hi,

I have come across a few stack overflow posts on the subject of request-response type semantics through KAFKA. From some of the approaches that I've read developers are using KAFKA's auto topic create feature (or AdminClient) to dynamically create topics per request-response channel. They mention that the number of processes that wish to communicate can vary, with the processes using some form of id to create unique topics for themselves. See https://stackoverflow.com/questions/35535785/does-kafka-support-request-response-messaging and https://stackoverflow.com/questions/46662102/correlating-in-kafka-and-dynamic-topics/46678198<https://stackoverflow.com/questions/46662102/correlating-in-kafka-and-dynamic-topics/46678198#46678198>.

This approach leads to several problems however:

1. The maximum number of such channels (topics) is limited by the memory of a ZK node (in-memory constraints since ZK is not sharded)
2. ZK is best used when reads outnumber writes. Creating topics at high rates could affect ZK cluster performance.
3. Once the communication is done by the initiating process some entity must delete the topic it used since it will never be used again

To solve one part of this problem, I find it strange that KAFKA does not provide a Topic Retention Policy (not a log retention policy).

This would delete topics which are considered "stale" from the KAFKA cluster and from ZK. By deleting topics for the user, this would reduce the amount of code and administrative headache stale topics currently place on the user.

Would this be a feature that the community would find value in, while keeping true to KAFKA's fundamentals and not require substantial refactoring?

Alexei

Re: Possible Feature: Topic Retention Policy

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Alexei,

Thanks for bringing up this question. Just my 2 cents:

1. For request-response messaging, I think an alternative approach is to
use a single topic for request queue, and use one temporary topic for
response queue. I.e. everyone sends their request to a single topic, and
wait for its own topic for response. After received the response from the
other topic, they can delete the topic before leaving using the admin
client.

2. One disadvantage for having topic retention policy is that in a shared
Kafka cluster, user's expected retention policy would be quite different,
so by the end of the day we would need to have a per-topic retention
policy. Then going back to the motivation use cases, when creating this
temporary topic the user needs to specify the retention policy specifically
for that topic. I think this pattern would be similar to: client create the
topic (without specifying the retention policy), then after received
expected topic from it the client can delete the topic. Note that with the
admin client users can programmatically delete the topic after completed
using it, so it does not necessarily need to introduce administrative
headaches for operation teams.

Guozhang

On Sat, Oct 14, 2017 at 7:48 AM, Alexei Zenin <alexei.zenin@mail.utoronto.ca
> wrote:

> Hi,
>
>
> I have come across a few stack overflow posts on the subject of
> request-response type semantics through KAFKA. From some of the approaches
> that I've read developers are using KAFKA's auto topic create feature (or
> AdminClient) to dynamically create topics per request-response channel.
> They mention that the number of processes that wish to communicate can
> vary, with the processes using some form of id to create unique topics for
> themselves. See https://stackoverflow.com/questions/35535785/does-kafka-
> support-request-response-messaging and https://stackoverflow.com/
> questions/46662102/correlating-in-kafka-and-dynamic-topics/46678198<https:
> //stackoverflow.com/questions/46662102/correlating-in-kafka-
> and-dynamic-topics/46678198#46678198>.
>
>
> This approach leads to several problems however:
>
>
>   1.  The maximum number of such channels (topics) is limited by the
> memory of a ZK node (in-memory constraints since ZK is not sharded)
>   2.  ZK is best used when reads outnumber writes. Creating topics at high
> rates could affect ZK cluster performance.
>   3.  Once the communication is done by the initiating process some entity
> must delete the topic it used since it will never be used again
>
> To solve one part of this problem, I find it strange that KAFKA does not
> provide a Topic Retention Policy (not a log retention policy).
>
> This would delete topics which are considered "stale" from the KAFKA
> cluster and from ZK. By deleting topics for the user, this would reduce the
> amount of code and administrative headache stale topics currently place on
> the user.
>
> Would this be a feature that the community would find value in, while
> keeping true to KAFKA's fundamentals and not require substantial
> refactoring?
>
> Alexei
>
>
>
>
>

-- 
-- Guozhang