You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by John Roesler <vv...@apache.org> on 2020/12/04 22:39:00 UTC

[DISCUSS] KIP-695: Improve Streams Time Synchronization

Hello all,

I'd like to propose KIP-695 to improve on the "task idling" feature we
introduced in KIP-353. This KIP will allow Streams to offer deterministic
time semantics in join-type topologies. For example, it makes sure that
when you join two topics, that we collate the topics by timestamp. That
was always the intent with task idling (KIP-353), but it turns out the
previous mechanism couldn't provide the desired semantics.

The details are here:
https://cwiki.apache.org/confluence/x/JSXZCQ

I look forward to your feedback!

Thanks,
-John

Re: [DISCUSS] KIP-695: Improve Streams Time Synchronization

Posted by John Roesler <vv...@apache.org>.

Thanks for taking a look, Bruno!

You have a sharp eye. All I meant by that is that
we don't want to draw conclusions from metadata that
we received a long time ago (for example, if fetches have
been failing), but we also don't want to enforce waiting on
a new fetch every single time we process a task with an
empty partition.

I didn't want to introduce a new configuration option to
govern the staleness of fetch metadata because it doesn't
fundamentally affect the guarantees of task idling and it
should also be possible to make a heuristic based on other
configs.

The main point of the KIP is to force us to wait for data
that we know to be on the brokers. When it comes to waiting
for new data to be produced to the brokers, we can only
provide an approximation. This interaction is free of
synchronization between the clients and brokers, so even
with an extremely strict bound on staleness, we could never
guarantee to poll records that were produced in close
proximity to the idling timeout.

Therefore, I wanted to leave a little liberty in the KIP to
let us adjust the heuristic dymanically, or in response to
benchmarks or field feedback, etc.

I'll update the KIP to specify that the exact determination
of staleness is left as an implementation detail.

Thanks for the feedback,
-John

On Tue, 2020-12-08 at 10:21 +0100, Bruno Cadonna wrote:
> Thank you for the KIP, John!
> 
> Overall, the KIP looks good to me.
> I was just wondering what do you mean by "too stale". Could you define 
> "too stale"?
> 
> Best,
> Bruno
> 
> 
> On 04.12.20 23:39, John Roesler wrote:
> > Hello all,
> > 
> > I'd like to propose KIP-695 to improve on the "task idling" feature we
> > introduced in KIP-353. This KIP will allow Streams to offer deterministic
> > time semantics in join-type topologies. For example, it makes sure that
> > when you join two topics, that we collate the topics by timestamp. That
> > was always the intent with task idling (KIP-353), but it turns out the
> > previous mechanism couldn't provide the desired semantics.
> > 
> > The details are here:
> > https://cwiki.apache.org/confluence/x/JSXZCQ
> > 
> > I look forward to your feedback!
> > 
> > Thanks,
> > -John
> >

Re: [DISCUSS] KIP-695: Improve Streams Time Synchronization

Posted by Bruno Cadonna <br...@confluent.io>.

Thank you for the KIP, John!

Overall, the KIP looks good to me.
I was just wondering what do you mean by "too stale". Could you define 
"too stale"?

Best,
Bruno


On 04.12.20 23:39, John Roesler wrote:
> Hello all,
> 
> I'd like to propose KIP-695 to improve on the "task idling" feature we
> introduced in KIP-353. This KIP will allow Streams to offer deterministic
> time semantics in join-type topologies. For example, it makes sure that
> when you join two topics, that we collate the topics by timestamp. That
> was always the intent with task idling (KIP-353), but it turns out the
> previous mechanism couldn't provide the desired semantics.
> 
> The details are here:
> https://cwiki.apache.org/confluence/x/JSXZCQ
> 
> I look forward to your feedback!
> 
> Thanks,
> -John
>