You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Beyene, Mehari" <me...@amazon.com.INVALID> on 2023/05/30 20:07:04 UTC

[DISCUSS] KIP-937 Improve Message Timestamp Validation

Hi Everyone,

I would like to start a discussion on KIP-937: Improve Message Timestamp Validation (https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation).
This is a small KIP that aims to tighten the current validation logic of a message timestamp.

Thanks,
Mehari

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Hi Gaurav,

You have a valid concern regarding the proposed default value for log.message.timestamp.after.max.ms being only 1 hour. 

We can keep this as Long.MAX_VALUE for the first phase implementation of this KIP and change the default value to 1 hour in future iterations. In the meantime, users can override this configuration and protect against messages with future timestamps.

I have updated the KIP to reflect this change.

Thanks,
Mehari


Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Gaurav Badoni <ga...@gmail.com>.
Hi Mehari,

Thanks for the KIP!

The default value of *log.message.timestamp.after.max.ms
<http://log.message.timestamp.after.max.ms>* is proposed to be 1 hour.
Given that this is a client application breaking change, should we consider
an opt-in grace period to give a proper heads up to the users? We could
consider making the default as Long.MAX and printing a message that it will
be made to 1 hour in the next major version bump. Subsequently, we could
consider making the default as 1 hour in the next major version bump. This
would ensure a smoother transition for current users.

Thanks!
Gaurav

On Wed, Jun 21, 2023 at 1:45 PM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hey Mehari,
> I was just getting annoyed looking at logs of kafka clusters with timestamp
> issues. Let me take final look at the KIP.
>
> Thanks,
> Justine
>
> On Wed, Jun 21, 2023 at 11:13 AM Beyene, Mehari <mehbey@amazon.com.invalid
> >
> wrote:
>
> > Hi Justine,
> >
> > I have initiated the voting process for this KIP here:
> > https://lists.apache.org/thread/y3yfnphsmrgwfdhx3xfhjtwdb7p1dn0v
> > We have already received two binding votes, and I am seeking a third vote
> > for the adoption of the KIP.
> > As you have previously reviewed this KIP, would you be willing to cast
> > your vote?
> >
> > Please also let me know if we need to discuss more or address additional
> > comments.
> >
> > Thanks,
> > Mehari
> >
> >
> >
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hey Mehari,
I was just getting annoyed looking at logs of kafka clusters with timestamp
issues. Let me take final look at the KIP.

Thanks,
Justine

On Wed, Jun 21, 2023 at 11:13 AM Beyene, Mehari <me...@amazon.com.invalid>
wrote:

> Hi Justine,
>
> I have initiated the voting process for this KIP here:
> https://lists.apache.org/thread/y3yfnphsmrgwfdhx3xfhjtwdb7p1dn0v
> We have already received two binding votes, and I am seeking a third vote
> for the adoption of the KIP.
> As you have previously reviewed this KIP, would you be willing to cast
> your vote?
>
> Please also let me know if we need to discuss more or address additional
> comments.
>
> Thanks,
> Mehari
>
>
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Hi Justine,

I have initiated the voting process for this KIP here: https://lists.apache.org/thread/y3yfnphsmrgwfdhx3xfhjtwdb7p1dn0v 
We have already received two binding votes, and I am seeking a third vote for the adoption of the KIP. 
As you have previously reviewed this KIP, would you be willing to cast your vote?

Please also let me know if we need to discuss more or address additional comments.

Thanks,
Mehari



Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Luke Chen <sh...@gmail.com>.
Hi Mehari,

Thanks for the update.
The KIP LGTM now.

Luke

On Tue, Jun 13, 2023 at 12:20 AM Beyene, Mehari <me...@amazon.com.invalid>
wrote:

> Hi Luke, Divij and All,
>
> I have updated the KIP to incorporate Luke's comment. We now have two new
> configurations for validating the message timestamp difference.  The
> existing configuration, "log.message.timestamp.difference.max.ms", will
> be deprecated. The "before" validation uses the same default value as "
> log.message.timestamp.difference.max.ms" to maintain backward
> compatibility. The "after" validation uses a one-hour default value.
>
> Thanks,
> Mehari
>
>
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Hi Luke, Divij and All,

I have updated the KIP to incorporate Luke's comment. We now have two new configurations for validating the message timestamp difference.  The existing configuration, "log.message.timestamp.difference.max.ms", will be deprecated. The "before" validation uses the same default value as "log.message.timestamp.difference.max.ms" to maintain backward compatibility. The "after" validation uses a one-hour default value.

Thanks,
Mehari



Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Kowshik Prakasam <ko...@gmail.com>.
Hi all,

Please ignore this message. I'm just using this message to bump this thread
so that it will show up in Gaurav's inbox. He wanted to send out review
comments for this KIP.


Cheers,
Kowshik


On Wed, Jun 7, 2023 at 1:39 PM Beyene, Mehari <me...@amazon.com.invalid>
wrote:

> > Although it's more verbose, splitting the configuration into explicit
> ‘past’ and ‘future’ would provide the appropriate tradeoff between
> constraint and flexibility, right?
>
> +1
>
>
>
>
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
> Although it's more verbose, splitting the configuration into explicit ‘past’ and ‘future’ would provide the appropriate tradeoff between constraint and flexibility, right?

+1 





Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Luke, thank you for the suggestion of introducing the two configurations. I have discussed this option internally with Divij, and your suggestion does have its merits. I will update the KIP with your suggestions and will revert back in a day or two.

Kirk, thank you for participating in the KIP discussion. I am unable to think of a reliable way to validate this on the client side with high confidence, as the frame of reference for validation is the Broker's timestamp. I am open to hearing any suggestions for client-side validation.

Thanks,
Mehari


>On 6/7/23, 10:26 AM, "Kirk True" <kirk@kirktrue.pro <ma...@kirktrue.pro>> wrote:


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.






Hi Mehari,


Thanks for the KIP and keeping it up-to-date with the discussions here!


Question:


1. Is it possible to check for invalid timestamps in the client? Suppose we were to develop a means to determine with high confidence that the user had provided a timestamp in nanoseconds vs. milliseconds, is it worth the trouble to either reject or output some debug logging? Of course, we don’t have the log.message.timestamp.difference.max.ms configuration on the client, and the client may be out of sync enough to cause false positives.


Thanks,
Kirk


> On Jun 6, 2023, at 5:42 AM, Divij Vaidya <divijvaidya13@gmail.com <ma...@gmail.com>> wrote:
>
> Hi Luke
>
> Thank you for your participation in reviewing this KIP.
>
> #1 Updated the KIP with correct configuration names and hyperlinks.
>
> #2 Yes, the semantics change from a perspective that the difference is
> always in the past (or at most 1 hour into the future). Updated the
> compatibility section to represent the same. Will update again after #3 is
> resolved.
>
> #3 I'd propose to introduce 2 new configs, one is "
> log.message.timestamp.before.max.ms", and the other one is "
> log.message.timestamp.after.max.ms".
> This is a great idea because with your proposal, 1\ we can ensure backward
> compatibility (hence safety of this breaking change) in 3.x by keeping the
> defaults of these configurations in line with what "
> log.message.timestamp.difference.max.ms" provides today 2\ and when 4.x has
> modified logic to perform segment rotation based on append time which will
> always be available, we can still re-use these configurations to reject
> messages based on validation of create timestamps. Mehari, thoughts?
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jun 6, 2023 at 11:43 AM Luke Chen <showuon@apache.org <ma...@apache.org>> wrote:
>
>> Hi Beyene,
>>
>> Thanks for the KIP.
>>
>> Questions:
>> 1. We don't have "*max.message.time.difference.ms
>> <http://max.message.time.difference.ms> <http://max.message.time.difference.ms&gt;>*" config, I think you're referring
>> to "log.message.timestamp.difference.max.ms"?
>> 2. After this KIP, the semantics of "
>> log.message.timestamp.difference.max.ms"
>> will be changed, right?
>> We should also mention it in this KIP, maybe compatibility section?
>> And I think the description of "log.message.timestamp.difference.max.ms"
>> will also need to be updated.
>> 3. Personally I like to see the "TIME_DRIFT_TOLERANCE" to be exposed as a
>> config, since after this change, the root issue is still not completed
>> resolved.
>> After this KIP, the 1 hour later of timestamp can still be appended
>> successfully, which might still be an issue for some applications.
>> I'd propose to introduce 2 new configs, one is "
>> log.message.timestamp.before.max.ms", and the other one is "
>> log.message.timestamp.after.max.ms".
>> And then we deprecate "log.message.timestamp.difference.max.ms". WDYT?
>>
>> Thank you.
>> Luke
>>
>> On Tue, Jun 6, 2023 at 8:02 AM Beyene, Mehari <mehbey@amazon.com.inva <ma...@amazon.com.inva>lid>
>> wrote:
>>
>>> Hey Justine and Divij,
>>>
>>> Thank you for the recommendations.
>>> I've made the updates to the KIP and added a new section called "Future
>>> Work: Update Message Format to Include Both Client Timestamp and
>> LogAppend
>>> Timestamp."
>>>
>>> Please take a look when get some time and let me know if there's anything
>>> else you'd like me to address.
>>>
>>> Thanks,
>>> Mehari
>>>
>>> On 6/5/23, 10:16 AM, "Divij Vaidya" <divijvaidya13@gmail.com <ma...@gmail.com> <mailto:
>>> divijvaidya13@gmail.com <ma...@gmail.com>>> wrote:
>>>
>>>
>>> CAUTION: This email originated from outside of the organization. Do not
>>> click links or open attachments unless you can confirm the sender and
>> know
>>> the content is safe.
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hey Justine
>>>
>>>
>>> Thank you for bringing this up. We had a discussion earlier in this [1]
>>> thread and concluded that bumping up the message version is a very
>>> expensive operation. Hence, we want to bundle together a bunch of
>>> impactful changes that we will perform on the message version and change
>> it
>>> in v4.0. We are currently capturing the ideas here [2]. The idea of
>> always
>>> having a log append time is captured in point 4 in the above wiki of
>> ideas.
>>>
>>>
>>> As you suggested, we will add a new section called "future work" and add
>>> the idea of two timestamps (& why not do it now) over there. Meanwhile,
>>> does the above explanation answer your question on why not to do it right
>>> now?
>>>
>>>
>>> [1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh <https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh> <
>>> https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh> <https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh&gt;>
>>> [2]
>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3 <https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3>
>>> <
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3 <https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3>
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> Divij Vaidya
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.inva <ma...@confluent.io.inva>
>>> <mailto:jolshan@confluent.io.inva <ma...@confluent.io.inva>>lid>
>>> wrote:
>>>
>>>
>>>> Hey Mehari,
>>>> Thanks for adding that section. I think one other thing folks have
>>>> considered is including two timestamps in the message format -- one for
>>> the
>>>> client side timestamp and one for the server side. Of course, this
>> would
>>>> require a bump to the message format, and that hasn't happened in a
>>> while.
>>>> Could we include some information on this approach and why we aren't
>>>> pursuing it? I think message format bumps are tricky, but it is worth
>>>> calling out that this is also an option.
>>>>
>>>> Thanks,
>>>> Justine
>>>>
>>>> On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.inva <ma...@amazon.com.inva>
>>> <mailto:mehbey@amazon.com.inva <ma...@amazon.com.inva>>lid>
>>>> wrote:
>>>>
>>>>> Hi Justine,
>>>>>
>>>>> I added a section under Proposed Changes -> Timestamp Validation
>> Logic
>>> to
>>>>> capture how the INVALID_TIMESTAMP is handled in this KIP.
>>>>> Please let me know if there are any additional areas you would like
>> me
>>> to
>>>>> address.
>>>>>
>>>>> Thanks,
>>>>> Mehari
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>






Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Kirk True <ki...@kirktrue.pro>.
Hi Mehari,

Thanks for the KIP and keeping it up-to-date with the discussions here!

Question:

1. Is it possible to check for invalid timestamps in the client? Suppose we were to develop a means to determine with high confidence that the user had provided a timestamp in nanoseconds vs. milliseconds, is it worth the trouble to either reject or output some debug logging? Of course, we don’t have the log.message.timestamp.difference.max.ms configuration on the client, and the client may be out of sync enough to cause false positives.

Thanks,
Kirk

> On Jun 6, 2023, at 5:42 AM, Divij Vaidya <di...@gmail.com> wrote:
> 
> Hi Luke
> 
> Thank you for your participation in reviewing this KIP.
> 
> #1 Updated the KIP with correct configuration names and hyperlinks.
> 
> #2 Yes, the semantics change from a perspective that the difference is
> always in the past (or at most 1 hour into the future). Updated the
> compatibility section to represent the same. Will update again after #3 is
> resolved.
> 
> #3 I'd propose to introduce 2 new configs, one is "
> log.message.timestamp.before.max.ms", and the other one is "
> log.message.timestamp.after.max.ms".
> This is a great idea because with your proposal, 1\ we can ensure backward
> compatibility (hence safety of this breaking change) in 3.x by keeping the
> defaults of these configurations in line with what "
> log.message.timestamp.difference.max.ms" provides today 2\ and when 4.x has
> modified logic to perform segment rotation based on append time which will
> always be available, we can still re-use these configurations to reject
> messages based on validation of create timestamps. Mehari, thoughts?
> 
> --
> Divij Vaidya
> 
> 
> 
> On Tue, Jun 6, 2023 at 11:43 AM Luke Chen <sh...@apache.org> wrote:
> 
>> Hi Beyene,
>> 
>> Thanks for the KIP.
>> 
>> Questions:
>> 1. We don't have "*max.message.time.difference.ms
>> <http://max.message.time.difference.ms>*" config, I think you're referring
>> to "log.message.timestamp.difference.max.ms"?
>> 2. After this KIP, the semantics of "
>> log.message.timestamp.difference.max.ms"
>> will be changed, right?
>> We should also mention it in this KIP, maybe compatibility section?
>> And I think the description of "log.message.timestamp.difference.max.ms"
>> will also need to be updated.
>> 3. Personally I like to see the "TIME_DRIFT_TOLERANCE" to be exposed as a
>> config, since after this change, the root issue is still not completed
>> resolved.
>> After this KIP, the 1 hour later of timestamp can still be appended
>> successfully, which might still be an issue for some applications.
>> I'd propose to introduce 2 new configs, one is "
>> log.message.timestamp.before.max.ms", and the other one is "
>> log.message.timestamp.after.max.ms".
>> And then we deprecate "log.message.timestamp.difference.max.ms". WDYT?
>> 
>> Thank you.
>> Luke
>> 
>> On Tue, Jun 6, 2023 at 8:02 AM Beyene, Mehari <me...@amazon.com.invalid>
>> wrote:
>> 
>>> Hey Justine and Divij,
>>> 
>>> Thank you for the recommendations.
>>> I've made the updates to the KIP and added a new section called "Future
>>> Work: Update Message Format to Include Both Client Timestamp and
>> LogAppend
>>> Timestamp."
>>> 
>>> Please take a look when get some time and let me know if there's anything
>>> else you'd like me to address.
>>> 
>>> Thanks,
>>> Mehari
>>> 
>>> On 6/5/23, 10:16 AM, "Divij Vaidya" <divijvaidya13@gmail.com <mailto:
>>> divijvaidya13@gmail.com>> wrote:
>>> 
>>> 
>>> CAUTION: This email originated from outside of the organization. Do not
>>> click links or open attachments unless you can confirm the sender and
>> know
>>> the content is safe.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hey Justine
>>> 
>>> 
>>> Thank you for bringing this up. We had a discussion earlier in this [1]
>>> thread and concluded that bumping up the message version is a very
>>> expensive operation. Hence, we want to bundle together a bunch of
>>> impactful changes that we will perform on the message version and change
>> it
>>> in v4.0. We are currently capturing the ideas here [2]. The idea of
>> always
>>> having a log append time is captured in point 4 in the above wiki of
>> ideas.
>>> 
>>> 
>>> As you suggested, we will add a new section called "future work" and add
>>> the idea of two timestamps (& why not do it now) over there. Meanwhile,
>>> does the above explanation answer your question on why not to do it right
>>> now?
>>> 
>>> 
>>> [1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh <
>>> https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh>
>>> [2]
>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
>>> <
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Divij Vaidya
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.inva
>>> <ma...@confluent.io.inva>lid>
>>> wrote:
>>> 
>>> 
>>>> Hey Mehari,
>>>> Thanks for adding that section. I think one other thing folks have
>>>> considered is including two timestamps in the message format -- one for
>>> the
>>>> client side timestamp and one for the server side. Of course, this
>> would
>>>> require a bump to the message format, and that hasn't happened in a
>>> while.
>>>> Could we include some information on this approach and why we aren't
>>>> pursuing it? I think message format bumps are tricky, but it is worth
>>>> calling out that this is also an option.
>>>> 
>>>> Thanks,
>>>> Justine
>>>> 
>>>> On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.inva
>>> <ma...@amazon.com.inva>lid>
>>>> wrote:
>>>> 
>>>>> Hi Justine,
>>>>> 
>>>>> I added a section under Proposed Changes -> Timestamp Validation
>> Logic
>>> to
>>>>> capture how the INVALID_TIMESTAMP is handled in this KIP.
>>>>> Please let me know if there are any additional areas you would like
>> me
>>> to
>>>>> address.
>>>>> 
>>>>> Thanks,
>>>>> Mehari
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>> 


Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Kirk True <ki...@kirktrue.pro>.
Hi Divij/all,

> On Jun 6, 2023, at 5:42 AM, Divij Vaidya <di...@gmail.com> wrote:
> 
> Hi Luke
> 
> Thank you for your participation in reviewing this KIP.
> 
> #1 Updated the KIP with correct configuration names and hyperlinks.
> 
> #2 Yes, the semantics change from a perspective that the difference is
> always in the past (or at most 1 hour into the future). Updated the
> compatibility section to represent the same. Will update again after #3 is
> resolved.
> 
> #3 I'd propose to introduce 2 new configs, one is "
> log.message.timestamp.before.max.ms", and the other one is "
> log.message.timestamp.after.max.ms".
> This is a great idea because with your proposal, 1\ we can ensure backward
> compatibility (hence safety of this breaking change) in 3.x by keeping the
> defaults of these configurations in line with what "
> log.message.timestamp.difference.max.ms" provides today 2\ and when 4.x has
> modified logic to perform segment rotation based on append time which will
> always be available, we can still re-use these configurations to reject
> messages based on validation of create timestamps. Mehari, thoughts?

Although it's more verbose, splitting the configuration into explicit ‘past’ and ‘future’ would provide the appropriate tradeoff between constraint and flexibility, right?

Thanks,
Kirk

> --
> Divij Vaidya
> 
> 
> 
> On Tue, Jun 6, 2023 at 11:43 AM Luke Chen <sh...@apache.org> wrote:
> 
>> Hi Beyene,
>> 
>> Thanks for the KIP.
>> 
>> Questions:
>> 1. We don't have "*max.message.time.difference.ms
>> <http://max.message.time.difference.ms>*" config, I think you're referring
>> to "log.message.timestamp.difference.max.ms"?
>> 2. After this KIP, the semantics of "
>> log.message.timestamp.difference.max.ms"
>> will be changed, right?
>> We should also mention it in this KIP, maybe compatibility section?
>> And I think the description of "log.message.timestamp.difference.max.ms"
>> will also need to be updated.
>> 3. Personally I like to see the "TIME_DRIFT_TOLERANCE" to be exposed as a
>> config, since after this change, the root issue is still not completed
>> resolved.
>> After this KIP, the 1 hour later of timestamp can still be appended
>> successfully, which might still be an issue for some applications.
>> I'd propose to introduce 2 new configs, one is "
>> log.message.timestamp.before.max.ms", and the other one is "
>> log.message.timestamp.after.max.ms".
>> And then we deprecate "log.message.timestamp.difference.max.ms". WDYT?
>> 
>> Thank you.
>> Luke
>> 
>> On Tue, Jun 6, 2023 at 8:02 AM Beyene, Mehari <me...@amazon.com.invalid>
>> wrote:
>> 
>>> Hey Justine and Divij,
>>> 
>>> Thank you for the recommendations.
>>> I've made the updates to the KIP and added a new section called "Future
>>> Work: Update Message Format to Include Both Client Timestamp and
>> LogAppend
>>> Timestamp."
>>> 
>>> Please take a look when get some time and let me know if there's anything
>>> else you'd like me to address.
>>> 
>>> Thanks,
>>> Mehari
>>> 
>>> On 6/5/23, 10:16 AM, "Divij Vaidya" <divijvaidya13@gmail.com <mailto:
>>> divijvaidya13@gmail.com>> wrote:
>>> 
>>> 
>>> CAUTION: This email originated from outside of the organization. Do not
>>> click links or open attachments unless you can confirm the sender and
>> know
>>> the content is safe.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hey Justine
>>> 
>>> 
>>> Thank you for bringing this up. We had a discussion earlier in this [1]
>>> thread and concluded that bumping up the message version is a very
>>> expensive operation. Hence, we want to bundle together a bunch of
>>> impactful changes that we will perform on the message version and change
>> it
>>> in v4.0. We are currently capturing the ideas here [2]. The idea of
>> always
>>> having a log append time is captured in point 4 in the above wiki of
>> ideas.
>>> 
>>> 
>>> As you suggested, we will add a new section called "future work" and add
>>> the idea of two timestamps (& why not do it now) over there. Meanwhile,
>>> does the above explanation answer your question on why not to do it right
>>> now?
>>> 
>>> 
>>> [1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh <
>>> https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh>
>>> [2]
>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
>>> <
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Divij Vaidya
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.inva
>>> <ma...@confluent.io.inva>lid>
>>> wrote:
>>> 
>>> 
>>>> Hey Mehari,
>>>> Thanks for adding that section. I think one other thing folks have
>>>> considered is including two timestamps in the message format -- one for
>>> the
>>>> client side timestamp and one for the server side. Of course, this
>> would
>>>> require a bump to the message format, and that hasn't happened in a
>>> while.
>>>> Could we include some information on this approach and why we aren't
>>>> pursuing it? I think message format bumps are tricky, but it is worth
>>>> calling out that this is also an option.
>>>> 
>>>> Thanks,
>>>> Justine
>>>> 
>>>> On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.inva
>>> <ma...@amazon.com.inva>lid>
>>>> wrote:
>>>> 
>>>>> Hi Justine,
>>>>> 
>>>>> I added a section under Proposed Changes -> Timestamp Validation
>> Logic
>>> to
>>>>> capture how the INVALID_TIMESTAMP is handled in this KIP.
>>>>> Please let me know if there are any additional areas you would like
>> me
>>> to
>>>>> address.
>>>>> 
>>>>> Thanks,
>>>>> Mehari
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>> 


Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Divij Vaidya <di...@gmail.com>.
Hi Luke

Thank you for your participation in reviewing this KIP.

#1 Updated the KIP with correct configuration names and hyperlinks.

#2 Yes, the semantics change from a perspective that the difference is
always in the past (or at most 1 hour into the future). Updated the
compatibility section to represent the same. Will update again after #3 is
resolved.

#3 I'd propose to introduce 2 new configs, one is "
log.message.timestamp.before.max.ms", and the other one is "
log.message.timestamp.after.max.ms".
This is a great idea because with your proposal, 1\ we can ensure backward
compatibility (hence safety of this breaking change) in 3.x by keeping the
defaults of these configurations in line with what "
log.message.timestamp.difference.max.ms" provides today 2\ and when 4.x has
modified logic to perform segment rotation based on append time which will
always be available, we can still re-use these configurations to reject
messages based on validation of create timestamps. Mehari, thoughts?

--
Divij Vaidya



On Tue, Jun 6, 2023 at 11:43 AM Luke Chen <sh...@apache.org> wrote:

> Hi Beyene,
>
> Thanks for the KIP.
>
> Questions:
> 1. We don't have "*max.message.time.difference.ms
> <http://max.message.time.difference.ms>*" config, I think you're referring
> to "log.message.timestamp.difference.max.ms"?
> 2. After this KIP, the semantics of "
> log.message.timestamp.difference.max.ms"
> will be changed, right?
> We should also mention it in this KIP, maybe compatibility section?
> And I think the description of "log.message.timestamp.difference.max.ms"
> will also need to be updated.
> 3. Personally I like to see the "TIME_DRIFT_TOLERANCE" to be exposed as a
> config, since after this change, the root issue is still not completed
> resolved.
> After this KIP, the 1 hour later of timestamp can still be appended
> successfully, which might still be an issue for some applications.
> I'd propose to introduce 2 new configs, one is "
> log.message.timestamp.before.max.ms", and the other one is "
> log.message.timestamp.after.max.ms".
> And then we deprecate "log.message.timestamp.difference.max.ms". WDYT?
>
> Thank you.
> Luke
>
> On Tue, Jun 6, 2023 at 8:02 AM Beyene, Mehari <me...@amazon.com.invalid>
> wrote:
>
> > Hey Justine and Divij,
> >
> > Thank you for the recommendations.
> > I've made the updates to the KIP and added a new section called "Future
> > Work: Update Message Format to Include Both Client Timestamp and
> LogAppend
> > Timestamp."
> >
> > Please take a look when get some time and let me know if there's anything
> > else you'd like me to address.
> >
> > Thanks,
> > Mehari
> >
> > On 6/5/23, 10:16 AM, "Divij Vaidya" <divijvaidya13@gmail.com <mailto:
> > divijvaidya13@gmail.com>> wrote:
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> >
> >
> >
> > Hey Justine
> >
> >
> > Thank you for bringing this up. We had a discussion earlier in this [1]
> > thread and concluded that bumping up the message version is a very
> > expensive operation. Hence, we want to bundle together a bunch of
> > impactful changes that we will perform on the message version and change
> it
> > in v4.0. We are currently capturing the ideas here [2]. The idea of
> always
> > having a log append time is captured in point 4 in the above wiki of
> ideas.
> >
> >
> > As you suggested, we will add a new section called "future work" and add
> > the idea of two timestamps (& why not do it now) over there. Meanwhile,
> > does the above explanation answer your question on why not to do it right
> > now?
> >
> >
> > [1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh <
> > https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh>
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
> > >
> >
> >
> >
> >
> > --
> > Divij Vaidya
> >
> >
> >
> >
> >
> >
> > On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.inva
> > <ma...@confluent.io.inva>lid>
> > wrote:
> >
> >
> > > Hey Mehari,
> > > Thanks for adding that section. I think one other thing folks have
> > > considered is including two timestamps in the message format -- one for
> > the
> > > client side timestamp and one for the server side. Of course, this
> would
> > > require a bump to the message format, and that hasn't happened in a
> > while.
> > > Could we include some information on this approach and why we aren't
> > > pursuing it? I think message format bumps are tricky, but it is worth
> > > calling out that this is also an option.
> > >
> > > Thanks,
> > > Justine
> > >
> > > On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.inva
> > <ma...@amazon.com.inva>lid>
> > > wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > I added a section under Proposed Changes -> Timestamp Validation
> Logic
> > to
> > > > capture how the INVALID_TIMESTAMP is handled in this KIP.
> > > > Please let me know if there are any additional areas you would like
> me
> > to
> > > > address.
> > > >
> > > > Thanks,
> > > > Mehari
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Luke Chen <sh...@apache.org>.
Hi Beyene,

Thanks for the KIP.

Questions:
1. We don't have "*max.message.time.difference.ms
<http://max.message.time.difference.ms>*" config, I think you're referring
to "log.message.timestamp.difference.max.ms"?
2. After this KIP, the semantics of "log.message.timestamp.difference.max.ms"
will be changed, right?
We should also mention it in this KIP, maybe compatibility section?
And I think the description of "log.message.timestamp.difference.max.ms"
will also need to be updated.
3. Personally I like to see the "TIME_DRIFT_TOLERANCE" to be exposed as a
config, since after this change, the root issue is still not completed
resolved.
After this KIP, the 1 hour later of timestamp can still be appended
successfully, which might still be an issue for some applications.
I'd propose to introduce 2 new configs, one is "
log.message.timestamp.before.max.ms", and the other one is "
log.message.timestamp.after.max.ms".
And then we deprecate "log.message.timestamp.difference.max.ms". WDYT?

Thank you.
Luke

On Tue, Jun 6, 2023 at 8:02 AM Beyene, Mehari <me...@amazon.com.invalid>
wrote:

> Hey Justine and Divij,
>
> Thank you for the recommendations.
> I've made the updates to the KIP and added a new section called "Future
> Work: Update Message Format to Include Both Client Timestamp and LogAppend
> Timestamp."
>
> Please take a look when get some time and let me know if there's anything
> else you'd like me to address.
>
> Thanks,
> Mehari
>
> On 6/5/23, 10:16 AM, "Divij Vaidya" <divijvaidya13@gmail.com <mailto:
> divijvaidya13@gmail.com>> wrote:
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
>
>
>
> Hey Justine
>
>
> Thank you for bringing this up. We had a discussion earlier in this [1]
> thread and concluded that bumping up the message version is a very
> expensive operation. Hence, we want to bundle together a bunch of
> impactful changes that we will perform on the message version and change it
> in v4.0. We are currently capturing the ideas here [2]. The idea of always
> having a log append time is captured in point 4 in the above wiki of ideas.
>
>
> As you suggested, we will add a new section called "future work" and add
> the idea of two timestamps (& why not do it now) over there. Meanwhile,
> does the above explanation answer your question on why not to do it right
> now?
>
>
> [1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh <
> https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh>
> [2]
>
> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
> <
> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
> >
>
>
>
>
> --
> Divij Vaidya
>
>
>
>
>
>
> On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.inva
> <ma...@confluent.io.inva>lid>
> wrote:
>
>
> > Hey Mehari,
> > Thanks for adding that section. I think one other thing folks have
> > considered is including two timestamps in the message format -- one for
> the
> > client side timestamp and one for the server side. Of course, this would
> > require a bump to the message format, and that hasn't happened in a
> while.
> > Could we include some information on this approach and why we aren't
> > pursuing it? I think message format bumps are tricky, but it is worth
> > calling out that this is also an option.
> >
> > Thanks,
> > Justine
> >
> > On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.inva
> <ma...@amazon.com.inva>lid>
> > wrote:
> >
> > > Hi Justine,
> > >
> > > I added a section under Proposed Changes -> Timestamp Validation Logic
> to
> > > capture how the INVALID_TIMESTAMP is handled in this KIP.
> > > Please let me know if there are any additional areas you would like me
> to
> > > address.
> > >
> > > Thanks,
> > > Mehari
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Hey Justine and Divij,

Thank you for the recommendations. 
I've made the updates to the KIP and added a new section called "Future Work: Update Message Format to Include Both Client Timestamp and LogAppend Timestamp."

Please take a look when get some time and let me know if there's anything else you'd like me to address.

Thanks,
Mehari

On 6/5/23, 10:16 AM, "Divij Vaidya" <divijvaidya13@gmail.com <ma...@gmail.com>> wrote:


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.






Hey Justine


Thank you for bringing this up. We had a discussion earlier in this [1]
thread and concluded that bumping up the message version is a very
expensive operation. Hence, we want to bundle together a bunch of
impactful changes that we will perform on the message version and change it
in v4.0. We are currently capturing the ideas here [2]. The idea of always
having a log append time is captured in point 4 in the above wiki of ideas.


As you suggested, we will add a new section called "future work" and add
the idea of two timestamps (& why not do it now) over there. Meanwhile,
does the above explanation answer your question on why not to do it right
now?


[1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh <https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh>
[2]
https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3 <https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3>




--
Divij Vaidya






On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.inva <ma...@confluent.io.inva>lid>
wrote:


> Hey Mehari,
> Thanks for adding that section. I think one other thing folks have
> considered is including two timestamps in the message format -- one for the
> client side timestamp and one for the server side. Of course, this would
> require a bump to the message format, and that hasn't happened in a while.
> Could we include some information on this approach and why we aren't
> pursuing it? I think message format bumps are tricky, but it is worth
> calling out that this is also an option.
>
> Thanks,
> Justine
>
> On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.inva <ma...@amazon.com.inva>lid>
> wrote:
>
> > Hi Justine,
> >
> > I added a section under Proposed Changes -> Timestamp Validation Logic to
> > capture how the INVALID_TIMESTAMP is handled in this KIP.
> > Please let me know if there are any additional areas you would like me to
> > address.
> >
> > Thanks,
> > Mehari
> >
> >
> >
> >
> >
>




Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hey Divij,
Yeah, this makes sense. Let's just include this in the KIP as well.

Thanks!

On Mon, Jun 5, 2023 at 10:16 AM Divij Vaidya <di...@gmail.com>
wrote:

> Hey Justine
>
> Thank you for bringing this up. We had a discussion earlier in this [1]
> thread and concluded that bumping up the message version is a very
> expensive operation. Hence, we want to bundle together a bunch of
> impactful changes that we will perform on the message version and change it
> in v4.0. We are currently capturing the ideas here [2]. The idea of always
> having a log append time is captured in point 4 in the above wiki of ideas.
>
> As you suggested, we will add a new section called "future work" and add
> the idea of two timestamps (& why not do it now) over there. Meanwhile,
> does the above explanation answer your question on why not to do it right
> now?
>
> [1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh
> [2]
>
> https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3
>
>
> --
> Divij Vaidya
>
>
>
> On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jolshan@confluent.io.invalid
> >
> wrote:
>
> > Hey Mehari,
> > Thanks for adding that section. I think one other thing folks have
> > considered is including two timestamps in the message format -- one for
> the
> > client side timestamp and one for the server side. Of course, this would
> > require a bump to the message format, and that hasn't happened in a
> while.
> > Could we include some information on this approach and why we aren't
> > pursuing it? I think message format bumps are tricky, but it is worth
> > calling out that this is also an option.
> >
> > Thanks,
> > Justine
> >
> > On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <mehbey@amazon.com.invalid
> >
> > wrote:
> >
> > > Hi Justine,
> > >
> > > I added a section under Proposed Changes -> Timestamp Validation Logic
> to
> > > capture how the INVALID_TIMESTAMP is handled in this KIP.
> > > Please let me know if there are any additional areas you would like me
> to
> > > address.
> > >
> > > Thanks,
> > > Mehari
> > >
> > >
> > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Divij Vaidya <di...@gmail.com>.
Hey Justine

Thank you for bringing this up. We had a discussion earlier in this [1]
thread and concluded that bumping up the message version is a very
expensive operation. Hence, we want to bundle together a bunch of
impactful changes that we will perform on the message version and change it
in v4.0. We are currently capturing the ideas here [2]. The idea of always
having a log append time is captured in point 4 in the above wiki of ideas.

As you suggested, we will add a new section called "future work" and add
the idea of two timestamps (& why not do it now) over there. Meanwhile,
does the above explanation answer your question on why not to do it right
now?

[1] https://lists.apache.org/thread/rxnps10t4vrsor46cx6xdj6t03qqxosh
[2]
https://cwiki.apache.org/confluence/display/KAFKA/ideas+for+kafka+message+format+v.3


-- 
Divij Vaidya



On Mon, Jun 5, 2023 at 6:42 PM Justine Olshan <jo...@confluent.io.invalid>
wrote:

> Hey Mehari,
> Thanks for adding that section. I think one other thing folks have
> considered is including two timestamps in the message format -- one for the
> client side timestamp and one for the server side. Of course, this would
> require a bump to the message format, and that hasn't happened in a while.
> Could we include some information on this approach and why we aren't
> pursuing it? I think message format bumps are tricky, but it is worth
> calling out that this is also an option.
>
> Thanks,
> Justine
>
> On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <me...@amazon.com.invalid>
> wrote:
>
> > Hi Justine,
> >
> > I added a section under Proposed Changes -> Timestamp Validation Logic to
> > capture how the INVALID_TIMESTAMP is handled in this KIP.
> > Please let me know if there are any additional areas you would like me to
> > address.
> >
> > Thanks,
> > Mehari
> >
> >
> >
> >
> >
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hey Mehari,
Thanks for adding that section. I think one other thing folks have
considered is including two timestamps in the message format -- one for the
client side timestamp and one for the server side. Of course, this would
require a bump to the message format, and that hasn't happened in a while.
Could we include some information on this approach and why we aren't
pursuing it? I think message format bumps are tricky, but it is worth
calling out that this is also an option.

Thanks,
Justine

On Fri, Jun 2, 2023 at 4:51 PM Beyene, Mehari <me...@amazon.com.invalid>
wrote:

> Hi Justine,
>
> I added a section under Proposed Changes -> Timestamp Validation Logic to
> capture how the INVALID_TIMESTAMP is handled in this KIP.
> Please let me know if there are any additional areas you would like me to
> address.
>
> Thanks,
> Mehari
>
>
>
>
>

Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Hi Justine, 

I added a section under Proposed Changes -> Timestamp Validation Logic to capture how the INVALID_TIMESTAMP is handled in this KIP.
Please let me know if there are any additional areas you would like me to address.

Thanks,
Mehari





Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by "Beyene, Mehari" <me...@amazon.com.INVALID>.
Thank you for the feedback Justine. Yes, I can expand on the KIP what the behavior is when we return INVALID_TIMESTAMP.


On 5/30/23, 1:34 PM, "Justine Olshan" <jolshan@confluent.io.INVA <ma...@confluent.io.INVA>LID> wrote:


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.






Hi Mehari,


This is an interesting KIP! I've seen my fair share of issues due to future
timestamps, so this is definitely an area that could be improved.


I noticed in the compatibility section it says:
> There are no changes to public interfaces that will impact clients.
However, this change is considered breaking, as messages with future
timestamps that were previously accepted by the broker will now be rejected.


It seems that INVALID_TIMESTAMP is not retriable so it will fail the batch.
I think that if we expect to handle this case just as we do for the already
existing invalid_timestamp error, we should include some information on how
that is handled in the KIP.


Thanks,
Justine




On Tue, May 30, 2023 at 1:07 PM Beyene, Mehari <mehbey@amazon.com.inva <ma...@amazon.com.inva>lid>
wrote:


> Hi Everyone,
>
> I would like to start a discussion on KIP-937: Improve Message Timestamp
> Validation (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation <https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation>
> ).
> This is a small KIP that aims to tighten the current validation logic of a
> message timestamp.
>
> Thanks,
> Mehari
>




Re: [DISCUSS] KIP-937 Improve Message Timestamp Validation

Posted by Justine Olshan <jo...@confluent.io.INVALID>.
Hi Mehari,

This is an interesting KIP! I've seen my fair share of issues due to future
timestamps, so this is definitely an area that could be improved.

I noticed in the compatibility section it says:
> There are no changes to public interfaces that will impact clients.
However, this change is considered breaking, as messages with future
timestamps that were previously accepted by the broker will now be rejected.

It seems that INVALID_TIMESTAMP is not retriable so it will fail the batch.
I think that if we expect to handle this case just as we do for the already
existing invalid_timestamp error, we should include some information on how
that is handled in the KIP.

Thanks,
Justine


On Tue, May 30, 2023 at 1:07 PM Beyene, Mehari <me...@amazon.com.invalid>
wrote:

> Hi Everyone,
>
> I would like to start a discussion on KIP-937: Improve Message Timestamp
> Validation (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-937%3A+Improve+Message+Timestamp+Validation
> ).
> This is a small KIP that aims to tighten the current validation logic of a
> message timestamp.
>
> Thanks,
> Mehari
>