You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by hudeqi <16...@bjtu.edu.cn> on 2023/03/07 07:21:36 UTC

Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms

Long time no see, this issue has been discussed for a long time, now please allow me to summarize this issue, and then everyone can help to see which direction this issue should go in?

There are two problems to be solved by this kip:
1. Solve the problem that when the client configures the "auto.offset.reset" to latest, the new partition data may be lost when the consumer resets the offset to the latest after expanding the topic partition.

2. In addition to the "earliest", "latest", and "none" provided by the existing "auto.offset.reset", it also provides more abundant parameters, such as "latest_on_start" (application startup is reset to latest, and an exception is thrown if out of range occurs), "earliest_on_start" (application startup is reset to earliest, and an exception is thrown if out of range occurs), "nearest"(determined by "auto.offset.reset" when the program starts, and choose earliest or latest according to the distance between the current offset and log start offset and log end offset when out of range occurs).

According to the discussion results of the members above, it seems that there are concerns about adding these additional offset reset mechanisms: complexity and compatibility. In fact, these parameters do have corresponding benefits. Therefore, based on the above discussion results, I have sorted out two solution directions. You can help me to see which direction to follow:

1. The first one is to follow Guozhang's suggestion: keep the three parameters of "auto.offset.reset" and their meanings unchanged, reduce the confusion for Kafka users, and solve the compatibility problem by the way. Add these two parameters:
    a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used to initialize the offset. The default value is the parameter configured by "auto.offset.reset". If so, the strategy for initializing the offset remains unchanged from the previous behavior, ensuring compatibility. If the parameter is configured with "latest_on_start" or "earliest_on_start", then the offset will be reset according to the configured semantics when initializing the offset. In this way, the problem of data loss during partition expansion can be solved: configure "auto.offset.reset.on.no.initial.offset" to "latest_on_start", and configure "auto.offset.reset" to earliest.
    b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is illegal or out of range occurs. The default value is the parameter configured by "auto.offset.reset". If so, the processing of out of range is the same as before to ensure compatibility. If "nearest" is configured, then the semantic logic corresponding to "nearest" is used only for the case of out of range.

This solution ensures compatibility and ensures that the semantics of the original configuration remain unchanged. Only two incremental configurations are added to flexibly handle different situations.

2. The second is to directly reduce the complexity of this problem, and directly add the logic of resetting the initial offset of the newly expanded partition to the earliest to "auto.offset.reset"="latest". In this way, Kafka users do not need to perceive this subtle but useful change, and the processing of other situations remains unchanged (without considering too many rich offset processing mechanisms).

I hope you can help me with the direction of the solution to this issue, thank you.

Best,
hudeqi

Re: Re: Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms

Posted by hudeqi <16...@bjtu.edu.cn>.
I repost the newly changed KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-842%3A+Add+richer+group+offset+reset+mechanisms

&quot;hudeqi&quot; &lt;16120374@bjtu.edu.cn&gt;写道:
> Hello, have any mates who have discussed it before seen it? Also welcome new mates to discuss together.
> 
> &quot;hudeqi&quot; &lt;16120374@bjtu.edu.cn&gt;写道:
> > Long time no see, this issue has been discussed for a long time, now please allow me to summarize this issue, and then everyone can help to see which direction this issue should go in?
> > 
> > There are two problems to be solved by this kip:
> > 1. Solve the problem that when the client configures the "auto.offset.reset" to latest, the new partition data may be lost when the consumer resets the offset to the latest after expanding the topic partition.
> > 
> > 2. In addition to the "earliest", "latest", and "none" provided by the existing "auto.offset.reset", it also provides more abundant parameters, such as "latest_on_start" (application startup is reset to latest, and an exception is thrown if out of range occurs), "earliest_on_start" (application startup is reset to earliest, and an exception is thrown if out of range occurs), "nearest"(determined by "auto.offset.reset" when the program starts, and choose earliest or latest according to the distance between the current offset and log start offset and log end offset when out of range occurs).
> > 
> > According to the discussion results of the members above, it seems that there are concerns about adding these additional offset reset mechanisms: complexity and compatibility. In fact, these parameters do have corresponding benefits. Therefore, based on the above discussion results, I have sorted out two solution directions. You can help me to see which direction to follow:
> > 
> > 1. The first one is to follow Guozhang's suggestion: keep the three parameters of "auto.offset.reset" and their meanings unchanged, reduce the confusion for Kafka users, and solve the compatibility problem by the way. Add these two parameters:
> >     a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used to initialize the offset. The default value is the parameter configured by "auto.offset.reset". If so, the strategy for initializing the offset remains unchanged from the previous behavior, ensuring compatibility. If the parameter is configured with "latest_on_start" or "earliest_on_start", then the offset will be reset according to the configured semantics when initializing the offset. In this way, the problem of data loss during partition expansion can be solved: configure "auto.offset.reset.on.no.initial.offset" to "latest_on_start", and configure "auto.offset.reset" to earliest.
> >     b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is illegal or out of range occurs. The default value is the parameter configured by "auto.offset.reset". If so, the processing of out of range is the same as before to ensure compatibility. If "nearest" is configured, then the semantic logic corresponding to "nearest" is used only for the case of out of range.
> > 
> > This solution ensures compatibility and ensures that the semantics of the original configuration remain unchanged. Only two incremental configurations are added to flexibly handle different situations.
> > 
> > 2. The second is to directly reduce the complexity of this problem, and directly add the logic of resetting the initial offset of the newly expanded partition to the earliest to "auto.offset.reset"="latest". In this way, Kafka users do not need to perceive this subtle but useful change, and the processing of other situations remains unchanged (without considering too many rich offset processing mechanisms).
> > 
> > I hope you can help me with the direction of the solution to this issue, thank you.
> > 
> > Best,
> > hudeqi

[DISCUSS] KIP-842: Add richer group offset reset mechanisms

Posted by hudeqi <16...@bjtu.edu.cn>.
Is there any more attention to this KIP? :)
bump this thread.

Best,
hudeqi


&gt; -----原始邮件-----
&gt; 发件人: hudeqi &lt;16120374@bjtu.edu.cn&gt;
&gt; 发送时间: 2023-03-26 17:42:31 (星期日)
&gt; 收件人: dev@kafka.apache.org
&gt; 抄送: 
&gt; 主题: Re: Re: Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms
&gt; 

Re: Re: Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms

Posted by hudeqi <16...@bjtu.edu.cn>.
Is there any more attention to this KIP? 
bump this thread.

Best,
hudeqi

&quot;hudeqi&quot; &lt;16120374@bjtu.edu.cn&gt;写道:
> Hello, have any mates who have discussed it before seen it? Also welcome new mates to discuss together.
> 
> &quot;hudeqi&quot; &lt;16120374@bjtu.edu.cn&gt;写道:
> > Long time no see, this issue has been discussed for a long time, now please allow me to summarize this issue, and then everyone can help to see which direction this issue should go in?
> > 
> > There are two problems to be solved by this kip:
> > 1. Solve the problem that when the client configures the "auto.offset.reset" to latest, the new partition data may be lost when the consumer resets the offset to the latest after expanding the topic partition.
> > 
> > 2. In addition to the "earliest", "latest", and "none" provided by the existing "auto.offset.reset", it also provides more abundant parameters, such as "latest_on_start" (application startup is reset to latest, and an exception is thrown if out of range occurs), "earliest_on_start" (application startup is reset to earliest, and an exception is thrown if out of range occurs), "nearest"(determined by "auto.offset.reset" when the program starts, and choose earliest or latest according to the distance between the current offset and log start offset and log end offset when out of range occurs).
> > 
> > According to the discussion results of the members above, it seems that there are concerns about adding these additional offset reset mechanisms: complexity and compatibility. In fact, these parameters do have corresponding benefits. Therefore, based on the above discussion results, I have sorted out two solution directions. You can help me to see which direction to follow:
> > 
> > 1. The first one is to follow Guozhang's suggestion: keep the three parameters of "auto.offset.reset" and their meanings unchanged, reduce the confusion for Kafka users, and solve the compatibility problem by the way. Add these two parameters:
> >     a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used to initialize the offset. The default value is the parameter configured by "auto.offset.reset". If so, the strategy for initializing the offset remains unchanged from the previous behavior, ensuring compatibility. If the parameter is configured with "latest_on_start" or "earliest_on_start", then the offset will be reset according to the configured semantics when initializing the offset. In this way, the problem of data loss during partition expansion can be solved: configure "auto.offset.reset.on.no.initial.offset" to "latest_on_start", and configure "auto.offset.reset" to earliest.
> >     b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is illegal or out of range occurs. The default value is the parameter configured by "auto.offset.reset". If so, the processing of out of range is the same as before to ensure compatibility. If "nearest" is configured, then the semantic logic corresponding to "nearest" is used only for the case of out of range.
> > 
> > This solution ensures compatibility and ensures that the semantics of the original configuration remain unchanged. Only two incremental configurations are added to flexibly handle different situations.
> > 
> > 2. The second is to directly reduce the complexity of this problem, and directly add the logic of resetting the initial offset of the newly expanded partition to the earliest to "auto.offset.reset"="latest". In this way, Kafka users do not need to perceive this subtle but useful change, and the processing of other situations remains unchanged (without considering too many rich offset processing mechanisms).
> > 
> > I hope you can help me with the direction of the solution to this issue, thank you.
> > 
> > Best,
> > hudeqi

Re: Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms

Posted by hudeqi <16...@bjtu.edu.cn>.
Hello, have any mates who have discussed it before seen it? Also welcome new mates to discuss together.

&quot;hudeqi&quot; &lt;16120374@bjtu.edu.cn&gt;写道:
> Long time no see, this issue has been discussed for a long time, now please allow me to summarize this issue, and then everyone can help to see which direction this issue should go in?
> 
> There are two problems to be solved by this kip:
> 1. Solve the problem that when the client configures the "auto.offset.reset" to latest, the new partition data may be lost when the consumer resets the offset to the latest after expanding the topic partition.
> 
> 2. In addition to the "earliest", "latest", and "none" provided by the existing "auto.offset.reset", it also provides more abundant parameters, such as "latest_on_start" (application startup is reset to latest, and an exception is thrown if out of range occurs), "earliest_on_start" (application startup is reset to earliest, and an exception is thrown if out of range occurs), "nearest"(determined by "auto.offset.reset" when the program starts, and choose earliest or latest according to the distance between the current offset and log start offset and log end offset when out of range occurs).
> 
> According to the discussion results of the members above, it seems that there are concerns about adding these additional offset reset mechanisms: complexity and compatibility. In fact, these parameters do have corresponding benefits. Therefore, based on the above discussion results, I have sorted out two solution directions. You can help me to see which direction to follow:
> 
> 1. The first one is to follow Guozhang's suggestion: keep the three parameters of "auto.offset.reset" and their meanings unchanged, reduce the confusion for Kafka users, and solve the compatibility problem by the way. Add these two parameters:
>     a. "auto.offset.reset.on.no.initial.offse": Indicates the strategy used to initialize the offset. The default value is the parameter configured by "auto.offset.reset". If so, the strategy for initializing the offset remains unchanged from the previous behavior, ensuring compatibility. If the parameter is configured with "latest_on_start" or "earliest_on_start", then the offset will be reset according to the configured semantics when initializing the offset. In this way, the problem of data loss during partition expansion can be solved: configure "auto.offset.reset.on.no.initial.offset" to "latest_on_start", and configure "auto.offset.reset" to earliest.
>     b. "auto.offset.reset.on.invalid.offset": Indicates that the offset is illegal or out of range occurs. The default value is the parameter configured by "auto.offset.reset". If so, the processing of out of range is the same as before to ensure compatibility. If "nearest" is configured, then the semantic logic corresponding to "nearest" is used only for the case of out of range.
> 
> This solution ensures compatibility and ensures that the semantics of the original configuration remain unchanged. Only two incremental configurations are added to flexibly handle different situations.
> 
> 2. The second is to directly reduce the complexity of this problem, and directly add the logic of resetting the initial offset of the newly expanded partition to the earliest to "auto.offset.reset"="latest". In this way, Kafka users do not need to perceive this subtle but useful change, and the processing of other situations remains unchanged (without considering too many rich offset processing mechanisms).
> 
> I hope you can help me with the direction of the solution to this issue, thank you.
> 
> Best,
> hudeqi