You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Marco Aurélio Lotz <co...@marcolotz.com> on 2021/03/02 09:06:31 UTC

[DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Hi folks,

I would like to invite everyone to discuss further KIP-718:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated

I welcome all feedback on it.

Kind Regards,
Marco Lotz

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Marco Aurélio Lotz <co...@marcolotz.com>.
Hi Matthias,


Originally I had written a really long answer against deprecations of the
method. When I finished writing, I realised that I actually agree with all
your points.


As you said, if a user is worried about how a table store is persisted, she
will almost always also be worried about how the subscription store is
persisted. Thus we should deprecate the method with only table store
materialization.


I will change the KIP to reflect that. I will also add that we need to
provide information to the users-docs about subscription stores, since I
couldn’t find anything on Kafka documentation about it.


One small question - with us deprecating the API call, should we still have
a bug-fix to piggyback on the materialisation method provided to the
table-store? Not sure if you plan to have it removed at 3.0.0 or would like
to have a longer deprecation time.


Cheers,

Marco Lotz

On Wed, Apr 14, 2021 at 11:29 PM Matthias J. Sax <mj...@apache.org> wrote:

> If you argue this way, would the consequence not be that we would need
> to add even more overloads that allow to only use Materialized for the
> subscription store?
>
> -> I only want to switch to in-memory for the subscription store: Why do
> I need to set Materialized for the main store?
>
> Btw: the above is not something I would really argue for.
>
>
> The surface area of our public API is quite big and tend to overwhelm
> users, and thus I would like to keep it small is possible.
>
>
> > Must the user know about the subscription-stores beforehand? If yes, then
> > indeed we should deprecate.
>
> I tend to argue "yes".
>
>
> Keeping the old methods seems to be a rare use case IMHO. It seems most
> likely that user want to overwrite all stores. And thus, it seems to be
> error prone: I want to overwrite the stores, but I forget to overwrite
> the subscription store.
>
> And even if one wants to keep RocksDB for the subscription stores only,
> they can still pass in both Materialized object using the new methods.
>
>
>
> -Matthias
>
>
> On 4/14/21 1:23 PM, Marco Aurélio Lotz wrote:
> > Hi Matthias,
> >
> > Thanks for reviewing it. That's a nice question. I can provide you the
> > answer as a user of the API - but of course it very much depends on what
> > the original intention towards the user is.
> > The only reason I found out it was opinionated was when my tests using
> > in-memory materialization started failing due to the state-store clean-up
> > bug
> > <
> https://stackoverflow.com/questions/50602512/failed-to-delete-the-state-directory-in-ide-for-kafka-stream-application
> >
> > in Windows systems.
> >
> > To be honest, I think I wouldn't have investigated how subscription
> stores
> > are used without seeing that my system couldn't handle the expected data.
> > Must the user know about the subscription-stores beforehand? If yes, then
> > indeed we should deprecate.
> >
> > Having said that, it seems to me that deprecating would not be consistent
> > with other parts of the API that have parameterized methods hiding
> > complexity.
> > E.g. from the top of my head, join() will make a rocksDB materialization
> if
> > no-materialization method is provided.
> >
> > public <V1, R> KTable<K, R> join(final KTable<K, V1> other, final
> > ValueJoiner<? super V, ? super V1, ? extends R> joiner0
> >
> >
> > I see this as something equivalent. If the user didn't provide all the
> > methods, shouldn't it be fine to fallback to a default materialization
> > method as the other sections of the API already does?
> > In that scenario, there's no need to deprecate and we just need to make a
> > reasonable guess towards it. Using the same method as the table-store
> > sounds like a reasonable guess in this scenario.
> >
> > In my perspective as a user, I would prefer to have simpler methods
> knowing
> > I would be losing some fine-grain configuration. Does it sound reasonable
> > to you?
> >
> > Cheers,
> > Marco Lotz
> >
> > On Wed, Apr 14, 2021 at 8:47 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> >> I just re-read the KIP, and I am wondering why we would *only add* new
> >> methods.
> >>
> >> Is it an expected use case to only change the main stores but not the
> >> subscription stores?
> >>
> >> Wondering if we should deprecate the existing methods that only accept a
> >> single `Materialized` parameter?
> >>
> >>
> >> -Matthias
> >>
> >> On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote:
> >>> Hi everyone,
> >>>
> >>> I am fine with sticking with Materialized and adding the info to the
> >>> javadoc then - so we keep the footprint smaller.
> >>> I will then update the KIP to match what we discussed here and send it
> >> for
> >>> a vote next week.
> >>>
> >>> I will raise a new bug-fix ticket and change KAFKA-10383
> >>> <https://issues.apache.org/jira/browse/KAFKA-10383> to become a
> feature
> >> -
> >>> if that's ok.
> >>>
> >>> Cheers,
> >>> Marco
> >>>
> >>> On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> >>>
> >>>> Just catching up here.
> >>>>
> >>>> I agree that we have two issue, and the first (align subscription
> store
> >>>> to main store) can be done as a bug-fix.
> >>>>
> >>>> For the KIP (that addressed the second), I tend to agree that reusing
> >>>> `Materialized` might be better as it would keep the API surface area
> >>>> smaller.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 4/6/21 8:48 AM, John Roesler wrote:
> >>>>> Hi Marco,
> >>>>>
> >>>>> Just a quick clarification: I just reviewed the Materialized class.
> It
> >>>> looks like the only undesirable members are:
> >>>>> 1. Retention
> >>>>> 2. Key/Value serdes
> >>>>>
> >>>>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” ,
> for
> >>>> which case the withRetention javadoc already says it’s ignored.
> >>>>>
> >>>>> Perhaps we could just stick with Materialized by adding a note to the
> >>>> Key/Value serdes setters that they are ignored for FKJoin subscription
> >>>> stores?
> >>>>>
> >>>>> Not as elegant as a new config class, but these config classes
> actually
> >>>> bring a fair amount of complexity, so it might be nice to avoid a new
> >> one.
> >>>>>
> >>>>> Thanks,
> >>>>> John
> >>>>>
> >>>>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
> >>>>>> Hi John / Guozhang,
> >>>>>>
> >>>>>> If I correctly understood John's message, he agrees on having the
> two
> >>>>>> scenarios (piggy-back and api extension). In my view, these two
> >>>> scenarios
> >>>>>> are separate tasks - the first one is a bug-fix and the second is an
> >>>>>> improvement on the current API.
> >>>>>>
> >>>>>> - bug-fix: On the current API, we change its implementation to piggy
> >>>> back
> >>>>>> on the materialization method provided to the materialized
> parameter.
> >>>> This
> >>>>>> way it will not be opinionated anymore and will not force RocksDb
> >>>>>> persistence for subscription store. Thus an in-memory materialized
> >>>>>> parameter would imply an in-memory subscription store, for example.
> >>>> From my
> >>>>>> understanding, the original implementation tried to be as
> unopionated
> >>>>>> towards storage methods as possible - and the current implementation
> >> is
> >>>> not
> >>>>>> allowing that. Was that the case? We would still need to add this
> >>>>>> modification to the update notes, since it may affect some
> >> deployments.
> >>>>>>
> >>>>>> - improvement: We extend the API to allow a user to fine tune
> >> different
> >>>>>> materialization methods for subscription and join store. This is
> done
> >> by
> >>>>>> adding a new parameter to the associated methods.
> >>>>>>
> >>>>>> Does it sound reasonable to you Guozhang?
> >>>>>> On your question, does it make sense for an user to decide retention
> >>>>>> policies (withRetention method) or caching for subscription stores?
> I
> >>>> can
> >>>>>> see why to finetune Logging for example, but in a first moment not
> >> these
> >>>>>> other behaviours. That's why I am unsure about using Materialized
> >> class.
> >>>>>>
> >>>>>> @John, I will update the KIP with your points as soon as we clarify
> >>>> this.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Marco
> >>>>>>
> >>>>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Thanks Marco / John,
> >>>>>>>
> >>>>>>> I think the arguments for not piggy-backing on the existing
> >>>> Materialized
> >>>>>>> makes sense; on the other hand, if we go this route should we just
> >> use
> >>>> a
> >>>>>>> separate Materialized than using an extended /
> >>>>>>> narrowed-scoped MaterializedSubscription since it seems we want to
> >>>> allow
> >>>>>>> users to fully customize this store?
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Marco,
> >>>>>>>>
> >>>>>>>> Sorry if I caused any trouble!
> >>>>>>>>
> >>>>>>>> I don’t remember what I was thinking before, but reasoning about
> it
> >>>> now,
> >>>>>>>> you might need the fine-grained choice if:
> >>>>>>>>
> >>>>>>>> 1. The number or size of records in each partition of both tables
> is
> >>>>>>>> small(ish), but the cardinality of the join is very high. Then you
> >>>> might
> >>>>>>>> want an in-memory table store, but an on-disk subscription store.
> >>>>>>>>
> >>>>>>>> 2. The number or size of records is very large, but the join
> >>>> cardinality
> >>>>>>>> is low. Then you might need an on-disk table store, but an
> in-memory
> >>>>>>>> subscription store.
> >>>>>>>>
> >>>>>>>> 3. You might want a different kind (or differently configured)
> store
> >>>> for
> >>>>>>>> the subscription store, since it’s access pattern is so different.
> >>>>>>>>
> >>>>>>>> If you buy these, it might be good to put the justification into
> the
> >>>> KIP.
> >>>>>>>>
> >>>>>>>> I’m in favor of the default you’ve proposed.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> John
> >>>>>>>>
> >>>>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> >>>>>>>>> Hi Guozhang,
> >>>>>>>>>
> >>>>>>>>> Apologies for the late answer. Originally that was my proposal -
> to
> >>>>>>>>> piggyback on the provided materialisation method (
> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
> >>>>>>>>> John Roesler suggested to us to provide even further fine tuning
> on
> >>>> API
> >>>>>>>>> level parameters. Maybe we could see this as two sides of the
> same
> >>>>>>> coin:
> >>>>>>>>>
> >>>>>>>>> - On the current API, we change it to piggy back on the
> >>>> materialization
> >>>>>>>>> method provided to the join store.
> >>>>>>>>> - We extend the API to allow a user to fine tune different
> >>>>>>>> materialization
> >>>>>>>>> methods for subscription and join store.
> >>>>>>>>>
> >>>>>>>>> What do you think?
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Marco
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wangguoz@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Marco,
> >>>>>>>>>>
> >>>>>>>>>> Just a quick thought: what if we reuse the existing Materialized
> >>>>>>>> object for
> >>>>>>>>>> both subscription and join stores, instead of introducing a new
> >>>>>>> param /
> >>>>>>>>>> class?
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> >>>>>>>> contact@marcolotz.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi folks,
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to invite everyone to discuss further KIP-718:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >>>>>>>>>>>
> >>>>>>>>>>> I welcome all feedback on it.
> >>>>>>>>>>>
> >>>>>>>>>>> Kind Regards,
> >>>>>>>>>>> Marco Lotz
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> -- Guozhang
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> -- Guozhang
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by "Matthias J. Sax" <mj...@apache.org>.
If you argue this way, would the consequence not be that we would need
to add even more overloads that allow to only use Materialized for the
subscription store?

-> I only want to switch to in-memory for the subscription store: Why do
I need to set Materialized for the main store?

Btw: the above is not something I would really argue for.


The surface area of our public API is quite big and tend to overwhelm
users, and thus I would like to keep it small is possible.


> Must the user know about the subscription-stores beforehand? If yes, then
> indeed we should deprecate.

I tend to argue "yes".


Keeping the old methods seems to be a rare use case IMHO. It seems most
likely that user want to overwrite all stores. And thus, it seems to be
error prone: I want to overwrite the stores, but I forget to overwrite
the subscription store.

And even if one wants to keep RocksDB for the subscription stores only,
they can still pass in both Materialized object using the new methods.



-Matthias


On 4/14/21 1:23 PM, Marco Aurélio Lotz wrote:
> Hi Matthias,
> 
> Thanks for reviewing it. That's a nice question. I can provide you the
> answer as a user of the API - but of course it very much depends on what
> the original intention towards the user is.
> The only reason I found out it was opinionated was when my tests using
> in-memory materialization started failing due to the state-store clean-up
> bug
> <https://stackoverflow.com/questions/50602512/failed-to-delete-the-state-directory-in-ide-for-kafka-stream-application>
> in Windows systems.
> 
> To be honest, I think I wouldn't have investigated how subscription stores
> are used without seeing that my system couldn't handle the expected data.
> Must the user know about the subscription-stores beforehand? If yes, then
> indeed we should deprecate.
> 
> Having said that, it seems to me that deprecating would not be consistent
> with other parts of the API that have parameterized methods hiding
> complexity.
> E.g. from the top of my head, join() will make a rocksDB materialization if
> no-materialization method is provided.
> 
> public <V1, R> KTable<K, R> join(final KTable<K, V1> other, final
> ValueJoiner<? super V, ? super V1, ? extends R> joiner0
> 
> 
> I see this as something equivalent. If the user didn't provide all the
> methods, shouldn't it be fine to fallback to a default materialization
> method as the other sections of the API already does?
> In that scenario, there's no need to deprecate and we just need to make a
> reasonable guess towards it. Using the same method as the table-store
> sounds like a reasonable guess in this scenario.
> 
> In my perspective as a user, I would prefer to have simpler methods knowing
> I would be losing some fine-grain configuration. Does it sound reasonable
> to you?
> 
> Cheers,
> Marco Lotz
> 
> On Wed, Apr 14, 2021 at 8:47 PM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> I just re-read the KIP, and I am wondering why we would *only add* new
>> methods.
>>
>> Is it an expected use case to only change the main stores but not the
>> subscription stores?
>>
>> Wondering if we should deprecate the existing methods that only accept a
>> single `Materialized` parameter?
>>
>>
>> -Matthias
>>
>> On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote:
>>> Hi everyone,
>>>
>>> I am fine with sticking with Materialized and adding the info to the
>>> javadoc then - so we keep the footprint smaller.
>>> I will then update the KIP to match what we discussed here and send it
>> for
>>> a vote next week.
>>>
>>> I will raise a new bug-fix ticket and change KAFKA-10383
>>> <https://issues.apache.org/jira/browse/KAFKA-10383> to become a feature
>> -
>>> if that's ok.
>>>
>>> Cheers,
>>> Marco
>>>
>>> On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org> wrote:
>>>
>>>> Just catching up here.
>>>>
>>>> I agree that we have two issue, and the first (align subscription store
>>>> to main store) can be done as a bug-fix.
>>>>
>>>> For the KIP (that addressed the second), I tend to agree that reusing
>>>> `Materialized` might be better as it would keep the API surface area
>>>> smaller.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 4/6/21 8:48 AM, John Roesler wrote:
>>>>> Hi Marco,
>>>>>
>>>>> Just a quick clarification: I just reviewed the Materialized class. It
>>>> looks like the only undesirable members are:
>>>>> 1. Retention
>>>>> 2. Key/Value serdes
>>>>>
>>>>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for
>>>> which case the withRetention javadoc already says it’s ignored.
>>>>>
>>>>> Perhaps we could just stick with Materialized by adding a note to the
>>>> Key/Value serdes setters that they are ignored for FKJoin subscription
>>>> stores?
>>>>>
>>>>> Not as elegant as a new config class, but these config classes actually
>>>> bring a fair amount of complexity, so it might be nice to avoid a new
>> one.
>>>>>
>>>>> Thanks,
>>>>> John
>>>>>
>>>>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
>>>>>> Hi John / Guozhang,
>>>>>>
>>>>>> If I correctly understood John's message, he agrees on having the two
>>>>>> scenarios (piggy-back and api extension). In my view, these two
>>>> scenarios
>>>>>> are separate tasks - the first one is a bug-fix and the second is an
>>>>>> improvement on the current API.
>>>>>>
>>>>>> - bug-fix: On the current API, we change its implementation to piggy
>>>> back
>>>>>> on the materialization method provided to the materialized parameter.
>>>> This
>>>>>> way it will not be opinionated anymore and will not force RocksDb
>>>>>> persistence for subscription store. Thus an in-memory materialized
>>>>>> parameter would imply an in-memory subscription store, for example.
>>>> From my
>>>>>> understanding, the original implementation tried to be as unopionated
>>>>>> towards storage methods as possible - and the current implementation
>> is
>>>> not
>>>>>> allowing that. Was that the case? We would still need to add this
>>>>>> modification to the update notes, since it may affect some
>> deployments.
>>>>>>
>>>>>> - improvement: We extend the API to allow a user to fine tune
>> different
>>>>>> materialization methods for subscription and join store. This is done
>> by
>>>>>> adding a new parameter to the associated methods.
>>>>>>
>>>>>> Does it sound reasonable to you Guozhang?
>>>>>> On your question, does it make sense for an user to decide retention
>>>>>> policies (withRetention method) or caching for subscription stores? I
>>>> can
>>>>>> see why to finetune Logging for example, but in a first moment not
>> these
>>>>>> other behaviours. That's why I am unsure about using Materialized
>> class.
>>>>>>
>>>>>> @John, I will update the KIP with your points as soon as we clarify
>>>> this.
>>>>>>
>>>>>> Cheers,
>>>>>> Marco
>>>>>>
>>>>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>>> Thanks Marco / John,
>>>>>>>
>>>>>>> I think the arguments for not piggy-backing on the existing
>>>> Materialized
>>>>>>> makes sense; on the other hand, if we go this route should we just
>> use
>>>> a
>>>>>>> separate Materialized than using an extended /
>>>>>>> narrowed-scoped MaterializedSubscription since it seems we want to
>>>> allow
>>>>>>> users to fully customize this store?
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org>
>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Marco,
>>>>>>>>
>>>>>>>> Sorry if I caused any trouble!
>>>>>>>>
>>>>>>>> I don’t remember what I was thinking before, but reasoning about it
>>>> now,
>>>>>>>> you might need the fine-grained choice if:
>>>>>>>>
>>>>>>>> 1. The number or size of records in each partition of both tables is
>>>>>>>> small(ish), but the cardinality of the join is very high. Then you
>>>> might
>>>>>>>> want an in-memory table store, but an on-disk subscription store.
>>>>>>>>
>>>>>>>> 2. The number or size of records is very large, but the join
>>>> cardinality
>>>>>>>> is low. Then you might need an on-disk table store, but an in-memory
>>>>>>>> subscription store.
>>>>>>>>
>>>>>>>> 3. You might want a different kind (or differently configured) store
>>>> for
>>>>>>>> the subscription store, since it’s access pattern is so different.
>>>>>>>>
>>>>>>>> If you buy these, it might be good to put the justification into the
>>>> KIP.
>>>>>>>>
>>>>>>>> I’m in favor of the default you’ve proposed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
>>>>>>>>> Hi Guozhang,
>>>>>>>>>
>>>>>>>>> Apologies for the late answer. Originally that was my proposal - to
>>>>>>>>> piggyback on the provided materialisation method (
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
>>>>>>>>> John Roesler suggested to us to provide even further fine tuning on
>>>> API
>>>>>>>>> level parameters. Maybe we could see this as two sides of the same
>>>>>>> coin:
>>>>>>>>>
>>>>>>>>> - On the current API, we change it to piggy back on the
>>>> materialization
>>>>>>>>> method provided to the join store.
>>>>>>>>> - We extend the API to allow a user to fine tune different
>>>>>>>> materialization
>>>>>>>>> methods for subscription and join store.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Marco
>>>>>>>>>
>>>>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Marco,
>>>>>>>>>>
>>>>>>>>>> Just a quick thought: what if we reuse the existing Materialized
>>>>>>>> object for
>>>>>>>>>> both subscription and join stores, instead of introducing a new
>>>>>>> param /
>>>>>>>>>> class?
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
>>>>>>>> contact@marcolotz.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> I would like to invite everyone to discuss further KIP-718:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
>>>>>>>>>>>
>>>>>>>>>>> I welcome all feedback on it.
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>> Marco Lotz
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -- Guozhang
>>>>>>>
>>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Marco Aurélio Lotz <ma...@marcolotz.com>.
Hi Matthias,

Thanks for reviewing it. That's a nice question. I can provide you the
answer as a user of the API - but of course it very much depends on what
the original intention towards the user is.
The only reason I found out it was opinionated was when my tests using
in-memory materialization started failing due to the state-store clean-up
bug
<https://stackoverflow.com/questions/50602512/failed-to-delete-the-state-directory-in-ide-for-kafka-stream-application>
in Windows systems.

To be honest, I think I wouldn't have investigated how subscription stores
are used without seeing that my system couldn't handle the expected data.
Must the user know about the subscription-stores beforehand? If yes, then
indeed we should deprecate.

Having said that, it seems to me that deprecating would not be consistent
with other parts of the API that have parameterized methods hiding
complexity.
E.g. from the top of my head, join() will make a rocksDB materialization if
no-materialization method is provided.

public <V1, R> KTable<K, R> join(final KTable<K, V1> other, final
ValueJoiner<? super V, ? super V1, ? extends R> joiner0


I see this as something equivalent. If the user didn't provide all the
methods, shouldn't it be fine to fallback to a default materialization
method as the other sections of the API already does?
In that scenario, there's no need to deprecate and we just need to make a
reasonable guess towards it. Using the same method as the table-store
sounds like a reasonable guess in this scenario.

In my perspective as a user, I would prefer to have simpler methods knowing
I would be losing some fine-grain configuration. Does it sound reasonable
to you?

Cheers,
Marco Lotz

On Wed, Apr 14, 2021 at 8:47 PM Matthias J. Sax <mj...@apache.org> wrote:

> I just re-read the KIP, and I am wondering why we would *only add* new
> methods.
>
> Is it an expected use case to only change the main stores but not the
> subscription stores?
>
> Wondering if we should deprecate the existing methods that only accept a
> single `Materialized` parameter?
>
>
> -Matthias
>
> On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote:
> > Hi everyone,
> >
> > I am fine with sticking with Materialized and adding the info to the
> > javadoc then - so we keep the footprint smaller.
> > I will then update the KIP to match what we discussed here and send it
> for
> > a vote next week.
> >
> > I will raise a new bug-fix ticket and change KAFKA-10383
> > <https://issues.apache.org/jira/browse/KAFKA-10383> to become a feature
> -
> > if that's ok.
> >
> > Cheers,
> > Marco
> >
> > On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org> wrote:
> >
> >> Just catching up here.
> >>
> >> I agree that we have two issue, and the first (align subscription store
> >> to main store) can be done as a bug-fix.
> >>
> >> For the KIP (that addressed the second), I tend to agree that reusing
> >> `Materialized` might be better as it would keep the API surface area
> >> smaller.
> >>
> >>
> >> -Matthias
> >>
> >> On 4/6/21 8:48 AM, John Roesler wrote:
> >>> Hi Marco,
> >>>
> >>> Just a quick clarification: I just reviewed the Materialized class. It
> >> looks like the only undesirable members are:
> >>> 1. Retention
> >>> 2. Key/Value serdes
> >>>
> >>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for
> >> which case the withRetention javadoc already says it’s ignored.
> >>>
> >>> Perhaps we could just stick with Materialized by adding a note to the
> >> Key/Value serdes setters that they are ignored for FKJoin subscription
> >> stores?
> >>>
> >>> Not as elegant as a new config class, but these config classes actually
> >> bring a fair amount of complexity, so it might be nice to avoid a new
> one.
> >>>
> >>> Thanks,
> >>> John
> >>>
> >>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
> >>>> Hi John / Guozhang,
> >>>>
> >>>> If I correctly understood John's message, he agrees on having the two
> >>>> scenarios (piggy-back and api extension). In my view, these two
> >> scenarios
> >>>> are separate tasks - the first one is a bug-fix and the second is an
> >>>> improvement on the current API.
> >>>>
> >>>> - bug-fix: On the current API, we change its implementation to piggy
> >> back
> >>>> on the materialization method provided to the materialized parameter.
> >> This
> >>>> way it will not be opinionated anymore and will not force RocksDb
> >>>> persistence for subscription store. Thus an in-memory materialized
> >>>> parameter would imply an in-memory subscription store, for example.
> >> From my
> >>>> understanding, the original implementation tried to be as unopionated
> >>>> towards storage methods as possible - and the current implementation
> is
> >> not
> >>>> allowing that. Was that the case? We would still need to add this
> >>>> modification to the update notes, since it may affect some
> deployments.
> >>>>
> >>>> - improvement: We extend the API to allow a user to fine tune
> different
> >>>> materialization methods for subscription and join store. This is done
> by
> >>>> adding a new parameter to the associated methods.
> >>>>
> >>>> Does it sound reasonable to you Guozhang?
> >>>> On your question, does it make sense for an user to decide retention
> >>>> policies (withRetention method) or caching for subscription stores? I
> >> can
> >>>> see why to finetune Logging for example, but in a first moment not
> these
> >>>> other behaviours. That's why I am unsure about using Materialized
> class.
> >>>>
> >>>> @John, I will update the KIP with your points as soon as we clarify
> >> this.
> >>>>
> >>>> Cheers,
> >>>> Marco
> >>>>
> >>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Thanks Marco / John,
> >>>>>
> >>>>> I think the arguments for not piggy-backing on the existing
> >> Materialized
> >>>>> makes sense; on the other hand, if we go this route should we just
> use
> >> a
> >>>>> separate Materialized than using an extended /
> >>>>> narrowed-scoped MaterializedSubscription since it seems we want to
> >> allow
> >>>>> users to fully customize this store?
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Thanks Marco,
> >>>>>>
> >>>>>> Sorry if I caused any trouble!
> >>>>>>
> >>>>>> I don’t remember what I was thinking before, but reasoning about it
> >> now,
> >>>>>> you might need the fine-grained choice if:
> >>>>>>
> >>>>>> 1. The number or size of records in each partition of both tables is
> >>>>>> small(ish), but the cardinality of the join is very high. Then you
> >> might
> >>>>>> want an in-memory table store, but an on-disk subscription store.
> >>>>>>
> >>>>>> 2. The number or size of records is very large, but the join
> >> cardinality
> >>>>>> is low. Then you might need an on-disk table store, but an in-memory
> >>>>>> subscription store.
> >>>>>>
> >>>>>> 3. You might want a different kind (or differently configured) store
> >> for
> >>>>>> the subscription store, since it’s access pattern is so different.
> >>>>>>
> >>>>>> If you buy these, it might be good to put the justification into the
> >> KIP.
> >>>>>>
> >>>>>> I’m in favor of the default you’ve proposed.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> John
> >>>>>>
> >>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> >>>>>>> Hi Guozhang,
> >>>>>>>
> >>>>>>> Apologies for the late answer. Originally that was my proposal - to
> >>>>>>> piggyback on the provided materialisation method (
> >>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
> >>>>>>> John Roesler suggested to us to provide even further fine tuning on
> >> API
> >>>>>>> level parameters. Maybe we could see this as two sides of the same
> >>>>> coin:
> >>>>>>>
> >>>>>>> - On the current API, we change it to piggy back on the
> >> materialization
> >>>>>>> method provided to the join store.
> >>>>>>> - We extend the API to allow a user to fine tune different
> >>>>>> materialization
> >>>>>>> methods for subscription and join store.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Marco
> >>>>>>>
> >>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Marco,
> >>>>>>>>
> >>>>>>>> Just a quick thought: what if we reuse the existing Materialized
> >>>>>> object for
> >>>>>>>> both subscription and join stores, instead of introducing a new
> >>>>> param /
> >>>>>>>> class?
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> >>>>>> contact@marcolotz.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>> I would like to invite everyone to discuss further KIP-718:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >>>>>>>>>
> >>>>>>>>> I welcome all feedback on it.
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>> Marco Lotz
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>>
> >>>>
> >>
> >
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by "Matthias J. Sax" <mj...@apache.org>.
I just re-read the KIP, and I am wondering why we would *only add* new
methods.

Is it an expected use case to only change the main stores but not the
subscription stores?

Wondering if we should deprecate the existing methods that only accept a
single `Materialized` parameter?


-Matthias

On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote:
> Hi everyone,
> 
> I am fine with sticking with Materialized and adding the info to the
> javadoc then - so we keep the footprint smaller.
> I will then update the KIP to match what we discussed here and send it for
> a vote next week.
> 
> I will raise a new bug-fix ticket and change KAFKA-10383
> <https://issues.apache.org/jira/browse/KAFKA-10383> to become a feature -
> if that's ok.
> 
> Cheers,
> Marco
> 
> On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Just catching up here.
>>
>> I agree that we have two issue, and the first (align subscription store
>> to main store) can be done as a bug-fix.
>>
>> For the KIP (that addressed the second), I tend to agree that reusing
>> `Materialized` might be better as it would keep the API surface area
>> smaller.
>>
>>
>> -Matthias
>>
>> On 4/6/21 8:48 AM, John Roesler wrote:
>>> Hi Marco,
>>>
>>> Just a quick clarification: I just reviewed the Materialized class. It
>> looks like the only undesirable members are:
>>> 1. Retention
>>> 2. Key/Value serdes
>>>
>>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for
>> which case the withRetention javadoc already says it’s ignored.
>>>
>>> Perhaps we could just stick with Materialized by adding a note to the
>> Key/Value serdes setters that they are ignored for FKJoin subscription
>> stores?
>>>
>>> Not as elegant as a new config class, but these config classes actually
>> bring a fair amount of complexity, so it might be nice to avoid a new one.
>>>
>>> Thanks,
>>> John
>>>
>>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
>>>> Hi John / Guozhang,
>>>>
>>>> If I correctly understood John's message, he agrees on having the two
>>>> scenarios (piggy-back and api extension). In my view, these two
>> scenarios
>>>> are separate tasks - the first one is a bug-fix and the second is an
>>>> improvement on the current API.
>>>>
>>>> - bug-fix: On the current API, we change its implementation to piggy
>> back
>>>> on the materialization method provided to the materialized parameter.
>> This
>>>> way it will not be opinionated anymore and will not force RocksDb
>>>> persistence for subscription store. Thus an in-memory materialized
>>>> parameter would imply an in-memory subscription store, for example.
>> From my
>>>> understanding, the original implementation tried to be as unopionated
>>>> towards storage methods as possible - and the current implementation is
>> not
>>>> allowing that. Was that the case? We would still need to add this
>>>> modification to the update notes, since it may affect some deployments.
>>>>
>>>> - improvement: We extend the API to allow a user to fine tune different
>>>> materialization methods for subscription and join store. This is done by
>>>> adding a new parameter to the associated methods.
>>>>
>>>> Does it sound reasonable to you Guozhang?
>>>> On your question, does it make sense for an user to decide retention
>>>> policies (withRetention method) or caching for subscription stores? I
>> can
>>>> see why to finetune Logging for example, but in a first moment not these
>>>> other behaviours. That's why I am unsure about using Materialized class.
>>>>
>>>> @John, I will update the KIP with your points as soon as we clarify
>> this.
>>>>
>>>> Cheers,
>>>> Marco
>>>>
>>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com>
>> wrote:
>>>>
>>>>> Thanks Marco / John,
>>>>>
>>>>> I think the arguments for not piggy-backing on the existing
>> Materialized
>>>>> makes sense; on the other hand, if we go this route should we just use
>> a
>>>>> separate Materialized than using an extended /
>>>>> narrowed-scoped MaterializedSubscription since it seems we want to
>> allow
>>>>> users to fully customize this store?
>>>>>
>>>>> Guozhang
>>>>>
>>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org>
>> wrote:
>>>>>
>>>>>> Thanks Marco,
>>>>>>
>>>>>> Sorry if I caused any trouble!
>>>>>>
>>>>>> I don’t remember what I was thinking before, but reasoning about it
>> now,
>>>>>> you might need the fine-grained choice if:
>>>>>>
>>>>>> 1. The number or size of records in each partition of both tables is
>>>>>> small(ish), but the cardinality of the join is very high. Then you
>> might
>>>>>> want an in-memory table store, but an on-disk subscription store.
>>>>>>
>>>>>> 2. The number or size of records is very large, but the join
>> cardinality
>>>>>> is low. Then you might need an on-disk table store, but an in-memory
>>>>>> subscription store.
>>>>>>
>>>>>> 3. You might want a different kind (or differently configured) store
>> for
>>>>>> the subscription store, since it’s access pattern is so different.
>>>>>>
>>>>>> If you buy these, it might be good to put the justification into the
>> KIP.
>>>>>>
>>>>>> I’m in favor of the default you’ve proposed.
>>>>>>
>>>>>> Thanks,
>>>>>> John
>>>>>>
>>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
>>>>>>> Hi Guozhang,
>>>>>>>
>>>>>>> Apologies for the late answer. Originally that was my proposal - to
>>>>>>> piggyback on the provided materialisation method (
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
>>>>>>> John Roesler suggested to us to provide even further fine tuning on
>> API
>>>>>>> level parameters. Maybe we could see this as two sides of the same
>>>>> coin:
>>>>>>>
>>>>>>> - On the current API, we change it to piggy back on the
>> materialization
>>>>>>> method provided to the join store.
>>>>>>> - We extend the API to allow a user to fine tune different
>>>>>> materialization
>>>>>>> methods for subscription and join store.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Marco
>>>>>>>
>>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Marco,
>>>>>>>>
>>>>>>>> Just a quick thought: what if we reuse the existing Materialized
>>>>>> object for
>>>>>>>> both subscription and join stores, instead of introducing a new
>>>>> param /
>>>>>>>> class?
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
>>>>>> contact@marcolotz.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> I would like to invite everyone to discuss further KIP-718:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
>>>>>>>>>
>>>>>>>>> I welcome all feedback on it.
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>> Marco Lotz
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -- Guozhang
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -- Guozhang
>>>>>
>>>>
>>
> 

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Marco Aurélio Lotz <co...@marcolotz.com>.
Hi everyone,

I am fine with sticking with Materialized and adding the info to the
javadoc then - so we keep the footprint smaller.
I will then update the KIP to match what we discussed here and send it for
a vote next week.

I will raise a new bug-fix ticket and change KAFKA-10383
<https://issues.apache.org/jira/browse/KAFKA-10383> to become a feature -
if that's ok.

Cheers,
Marco

On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org> wrote:

> Just catching up here.
>
> I agree that we have two issue, and the first (align subscription store
> to main store) can be done as a bug-fix.
>
> For the KIP (that addressed the second), I tend to agree that reusing
> `Materialized` might be better as it would keep the API surface area
> smaller.
>
>
> -Matthias
>
> On 4/6/21 8:48 AM, John Roesler wrote:
> > Hi Marco,
> >
> > Just a quick clarification: I just reviewed the Materialized class. It
> looks like the only undesirable members are:
> > 1. Retention
> > 2. Key/Value serdes
> >
> > The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for
> which case the withRetention javadoc already says it’s ignored.
> >
> > Perhaps we could just stick with Materialized by adding a note to the
> Key/Value serdes setters that they are ignored for FKJoin subscription
> stores?
> >
> > Not as elegant as a new config class, but these config classes actually
> bring a fair amount of complexity, so it might be nice to avoid a new one.
> >
> > Thanks,
> > John
> >
> > On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
> >> Hi John / Guozhang,
> >>
> >> If I correctly understood John's message, he agrees on having the two
> >> scenarios (piggy-back and api extension). In my view, these two
> scenarios
> >> are separate tasks - the first one is a bug-fix and the second is an
> >> improvement on the current API.
> >>
> >> - bug-fix: On the current API, we change its implementation to piggy
> back
> >> on the materialization method provided to the materialized parameter.
> This
> >> way it will not be opinionated anymore and will not force RocksDb
> >> persistence for subscription store. Thus an in-memory materialized
> >> parameter would imply an in-memory subscription store, for example.
> From my
> >> understanding, the original implementation tried to be as unopionated
> >> towards storage methods as possible - and the current implementation is
> not
> >> allowing that. Was that the case? We would still need to add this
> >> modification to the update notes, since it may affect some deployments.
> >>
> >> - improvement: We extend the API to allow a user to fine tune different
> >> materialization methods for subscription and join store. This is done by
> >> adding a new parameter to the associated methods.
> >>
> >> Does it sound reasonable to you Guozhang?
> >> On your question, does it make sense for an user to decide retention
> >> policies (withRetention method) or caching for subscription stores? I
> can
> >> see why to finetune Logging for example, but in a first moment not these
> >> other behaviours. That's why I am unsure about using Materialized class.
> >>
> >> @John, I will update the KIP with your points as soon as we clarify
> this.
> >>
> >> Cheers,
> >> Marco
> >>
> >> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com>
> wrote:
> >>
> >>> Thanks Marco / John,
> >>>
> >>> I think the arguments for not piggy-backing on the existing
> Materialized
> >>> makes sense; on the other hand, if we go this route should we just use
> a
> >>> separate Materialized than using an extended /
> >>> narrowed-scoped MaterializedSubscription since it seems we want to
> allow
> >>> users to fully customize this store?
> >>>
> >>> Guozhang
> >>>
> >>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org>
> wrote:
> >>>
> >>>> Thanks Marco,
> >>>>
> >>>> Sorry if I caused any trouble!
> >>>>
> >>>> I don’t remember what I was thinking before, but reasoning about it
> now,
> >>>> you might need the fine-grained choice if:
> >>>>
> >>>> 1. The number or size of records in each partition of both tables is
> >>>> small(ish), but the cardinality of the join is very high. Then you
> might
> >>>> want an in-memory table store, but an on-disk subscription store.
> >>>>
> >>>> 2. The number or size of records is very large, but the join
> cardinality
> >>>> is low. Then you might need an on-disk table store, but an in-memory
> >>>> subscription store.
> >>>>
> >>>> 3. You might want a different kind (or differently configured) store
> for
> >>>> the subscription store, since it’s access pattern is so different.
> >>>>
> >>>> If you buy these, it might be good to put the justification into the
> KIP.
> >>>>
> >>>> I’m in favor of the default you’ve proposed.
> >>>>
> >>>> Thanks,
> >>>> John
> >>>>
> >>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> >>>>> Hi Guozhang,
> >>>>>
> >>>>> Apologies for the late answer. Originally that was my proposal - to
> >>>>> piggyback on the provided materialisation method (
> >>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
> >>>>> John Roesler suggested to us to provide even further fine tuning on
> API
> >>>>> level parameters. Maybe we could see this as two sides of the same
> >>> coin:
> >>>>>
> >>>>> - On the current API, we change it to piggy back on the
> materialization
> >>>>> method provided to the join store.
> >>>>> - We extend the API to allow a user to fine tune different
> >>>> materialization
> >>>>> methods for subscription and join store.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> Cheers,
> >>>>> Marco
> >>>>>
> >>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Thanks Marco,
> >>>>>>
> >>>>>> Just a quick thought: what if we reuse the existing Materialized
> >>>> object for
> >>>>>> both subscription and join stores, instead of introducing a new
> >>> param /
> >>>>>> class?
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> >>>> contact@marcolotz.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> I would like to invite everyone to discuss further KIP-718:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >>>>>>>
> >>>>>>> I welcome all feedback on it.
> >>>>>>>
> >>>>>>> Kind Regards,
> >>>>>>> Marco Lotz
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> -- Guozhang
> >>>
> >>
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by "Matthias J. Sax" <mj...@apache.org>.
Just catching up here.

I agree that we have two issue, and the first (align subscription store
to main store) can be done as a bug-fix.

For the KIP (that addressed the second), I tend to agree that reusing
`Materialized` might be better as it would keep the API surface area
smaller.


-Matthias

On 4/6/21 8:48 AM, John Roesler wrote:
> Hi Marco,
> 
> Just a quick clarification: I just reviewed the Materialized class. It looks like the only undesirable members are:
> 1. Retention
> 2. Key/Value serdes 
> 
> The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for which case the withRetention javadoc already says it’s ignored.
> 
> Perhaps we could just stick with Materialized by adding a note to the Key/Value serdes setters that they are ignored for FKJoin subscription stores?
> 
> Not as elegant as a new config class, but these config classes actually bring a fair amount of complexity, so it might be nice to avoid a new one.
> 
> Thanks,
> John
> 
> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
>> Hi John / Guozhang,
>>
>> If I correctly understood John's message, he agrees on having the two
>> scenarios (piggy-back and api extension). In my view, these two scenarios
>> are separate tasks - the first one is a bug-fix and the second is an
>> improvement on the current API.
>>
>> - bug-fix: On the current API, we change its implementation to piggy back
>> on the materialization method provided to the materialized parameter. This
>> way it will not be opinionated anymore and will not force RocksDb
>> persistence for subscription store. Thus an in-memory materialized
>> parameter would imply an in-memory subscription store, for example. From my
>> understanding, the original implementation tried to be as unopionated
>> towards storage methods as possible - and the current implementation is not
>> allowing that. Was that the case? We would still need to add this
>> modification to the update notes, since it may affect some deployments.
>>
>> - improvement: We extend the API to allow a user to fine tune different
>> materialization methods for subscription and join store. This is done by
>> adding a new parameter to the associated methods.
>>
>> Does it sound reasonable to you Guozhang?
>> On your question, does it make sense for an user to decide retention
>> policies (withRetention method) or caching for subscription stores? I can
>> see why to finetune Logging for example, but in a first moment not these
>> other behaviours. That's why I am unsure about using Materialized class.
>>
>> @John, I will update the KIP with your points as soon as we clarify this.
>>
>> Cheers,
>> Marco
>>
>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com> wrote:
>>
>>> Thanks Marco / John,
>>>
>>> I think the arguments for not piggy-backing on the existing Materialized
>>> makes sense; on the other hand, if we go this route should we just use a
>>> separate Materialized than using an extended /
>>> narrowed-scoped MaterializedSubscription since it seems we want to allow
>>> users to fully customize this store?
>>>
>>> Guozhang
>>>
>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org> wrote:
>>>
>>>> Thanks Marco,
>>>>
>>>> Sorry if I caused any trouble!
>>>>
>>>> I don’t remember what I was thinking before, but reasoning about it now,
>>>> you might need the fine-grained choice if:
>>>>
>>>> 1. The number or size of records in each partition of both tables is
>>>> small(ish), but the cardinality of the join is very high. Then you might
>>>> want an in-memory table store, but an on-disk subscription store.
>>>>
>>>> 2. The number or size of records is very large, but the join cardinality
>>>> is low. Then you might need an on-disk table store, but an in-memory
>>>> subscription store.
>>>>
>>>> 3. You might want a different kind (or differently configured) store for
>>>> the subscription store, since it’s access pattern is so different.
>>>>
>>>> If you buy these, it might be good to put the justification into the KIP.
>>>>
>>>> I’m in favor of the default you’ve proposed.
>>>>
>>>> Thanks,
>>>> John
>>>>
>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
>>>>> Hi Guozhang,
>>>>>
>>>>> Apologies for the late answer. Originally that was my proposal - to
>>>>> piggyback on the provided materialisation method (
>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
>>>>> John Roesler suggested to us to provide even further fine tuning on API
>>>>> level parameters. Maybe we could see this as two sides of the same
>>> coin:
>>>>>
>>>>> - On the current API, we change it to piggy back on the materialization
>>>>> method provided to the join store.
>>>>> - We extend the API to allow a user to fine tune different
>>>> materialization
>>>>> methods for subscription and join store.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Cheers,
>>>>> Marco
>>>>>
>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
>>> wrote:
>>>>>
>>>>>> Thanks Marco,
>>>>>>
>>>>>> Just a quick thought: what if we reuse the existing Materialized
>>>> object for
>>>>>> both subscription and join stores, instead of introducing a new
>>> param /
>>>>>> class?
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
>>>> contact@marcolotz.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> I would like to invite everyone to discuss further KIP-718:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
>>>>>>>
>>>>>>> I welcome all feedback on it.
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>> Marco Lotz
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -- Guozhang
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by John Roesler <vv...@apache.org>.
Hi Marco,

Just a quick clarification: I just reviewed the Materialized class. It looks like the only undesirable members are:
1. Retention
2. Key/Value serdes 

The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for which case the withRetention javadoc already says it’s ignored.

Perhaps we could just stick with Materialized by adding a note to the Key/Value serdes setters that they are ignored for FKJoin subscription stores?

Not as elegant as a new config class, but these config classes actually bring a fair amount of complexity, so it might be nice to avoid a new one.

Thanks,
John

On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
> Hi John / Guozhang,
> 
> If I correctly understood John's message, he agrees on having the two
> scenarios (piggy-back and api extension). In my view, these two scenarios
> are separate tasks - the first one is a bug-fix and the second is an
> improvement on the current API.
> 
> - bug-fix: On the current API, we change its implementation to piggy back
> on the materialization method provided to the materialized parameter. This
> way it will not be opinionated anymore and will not force RocksDb
> persistence for subscription store. Thus an in-memory materialized
> parameter would imply an in-memory subscription store, for example. From my
> understanding, the original implementation tried to be as unopionated
> towards storage methods as possible - and the current implementation is not
> allowing that. Was that the case? We would still need to add this
> modification to the update notes, since it may affect some deployments.
> 
> - improvement: We extend the API to allow a user to fine tune different
> materialization methods for subscription and join store. This is done by
> adding a new parameter to the associated methods.
> 
> Does it sound reasonable to you Guozhang?
> On your question, does it make sense for an user to decide retention
> policies (withRetention method) or caching for subscription stores? I can
> see why to finetune Logging for example, but in a first moment not these
> other behaviours. That's why I am unsure about using Materialized class.
> 
> @John, I will update the KIP with your points as soon as we clarify this.
> 
> Cheers,
> Marco
> 
> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com> wrote:
> 
> > Thanks Marco / John,
> >
> > I think the arguments for not piggy-backing on the existing Materialized
> > makes sense; on the other hand, if we go this route should we just use a
> > separate Materialized than using an extended /
> > narrowed-scoped MaterializedSubscription since it seems we want to allow
> > users to fully customize this store?
> >
> > Guozhang
> >
> > On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org> wrote:
> >
> > > Thanks Marco,
> > >
> > > Sorry if I caused any trouble!
> > >
> > > I don’t remember what I was thinking before, but reasoning about it now,
> > > you might need the fine-grained choice if:
> > >
> > > 1. The number or size of records in each partition of both tables is
> > > small(ish), but the cardinality of the join is very high. Then you might
> > > want an in-memory table store, but an on-disk subscription store.
> > >
> > > 2. The number or size of records is very large, but the join cardinality
> > > is low. Then you might need an on-disk table store, but an in-memory
> > > subscription store.
> > >
> > > 3. You might want a different kind (or differently configured) store for
> > > the subscription store, since it’s access pattern is so different.
> > >
> > > If you buy these, it might be good to put the justification into the KIP.
> > >
> > > I’m in favor of the default you’ve proposed.
> > >
> > > Thanks,
> > > John
> > >
> > > On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> > > > Hi Guozhang,
> > > >
> > > > Apologies for the late answer. Originally that was my proposal - to
> > > > piggyback on the provided materialisation method (
> > > > https://issues.apache.org/jira/browse/KAFKA-10383).
> > > > John Roesler suggested to us to provide even further fine tuning on API
> > > > level parameters. Maybe we could see this as two sides of the same
> > coin:
> > > >
> > > > - On the current API, we change it to piggy back on the materialization
> > > > method provided to the join store.
> > > > - We extend the API to allow a user to fine tune different
> > > materialization
> > > > methods for subscription and join store.
> > > >
> > > > What do you think?
> > > >
> > > > Cheers,
> > > > Marco
> > > >
> > > > On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks Marco,
> > > > >
> > > > > Just a quick thought: what if we reuse the existing Materialized
> > > object for
> > > > > both subscription and join stores, instead of introducing a new
> > param /
> > > > > class?
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> > > contact@marcolotz.com>
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > I would like to invite everyone to discuss further KIP-718:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> > > > > >
> > > > > > I welcome all feedback on it.
> > > > > >
> > > > > > Kind Regards,
> > > > > > Marco Lotz
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Marco Aurélio Lotz <co...@marcolotz.com>.
Hi John / Guozhang,

If I correctly understood John's message, he agrees on having the two
scenarios (piggy-back and api extension). In my view, these two scenarios
are separate tasks - the first one is a bug-fix and the second is an
improvement on the current API.

- bug-fix: On the current API, we change its implementation to piggy back
on the materialization method provided to the materialized parameter. This
way it will not be opinionated anymore and will not force RocksDb
persistence for subscription store. Thus an in-memory materialized
parameter would imply an in-memory subscription store, for example. From my
understanding, the original implementation tried to be as unopionated
towards storage methods as possible - and the current implementation is not
allowing that. Was that the case? We would still need to add this
modification to the update notes, since it may affect some deployments.

- improvement: We extend the API to allow a user to fine tune different
materialization methods for subscription and join store. This is done by
adding a new parameter to the associated methods.

Does it sound reasonable to you Guozhang?
On your question, does it make sense for an user to decide retention
policies (withRetention method) or caching for subscription stores? I can
see why to finetune Logging for example, but in a first moment not these
other behaviours. That's why I am unsure about using Materialized class.

@John, I will update the KIP with your points as soon as we clarify this.

Cheers,
Marco

On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wa...@gmail.com> wrote:

> Thanks Marco / John,
>
> I think the arguments for not piggy-backing on the existing Materialized
> makes sense; on the other hand, if we go this route should we just use a
> separate Materialized than using an extended /
> narrowed-scoped MaterializedSubscription since it seems we want to allow
> users to fully customize this store?
>
> Guozhang
>
> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org> wrote:
>
> > Thanks Marco,
> >
> > Sorry if I caused any trouble!
> >
> > I don’t remember what I was thinking before, but reasoning about it now,
> > you might need the fine-grained choice if:
> >
> > 1. The number or size of records in each partition of both tables is
> > small(ish), but the cardinality of the join is very high. Then you might
> > want an in-memory table store, but an on-disk subscription store.
> >
> > 2. The number or size of records is very large, but the join cardinality
> > is low. Then you might need an on-disk table store, but an in-memory
> > subscription store.
> >
> > 3. You might want a different kind (or differently configured) store for
> > the subscription store, since it’s access pattern is so different.
> >
> > If you buy these, it might be good to put the justification into the KIP.
> >
> > I’m in favor of the default you’ve proposed.
> >
> > Thanks,
> > John
> >
> > On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> > > Hi Guozhang,
> > >
> > > Apologies for the late answer. Originally that was my proposal - to
> > > piggyback on the provided materialisation method (
> > > https://issues.apache.org/jira/browse/KAFKA-10383).
> > > John Roesler suggested to us to provide even further fine tuning on API
> > > level parameters. Maybe we could see this as two sides of the same
> coin:
> > >
> > > - On the current API, we change it to piggy back on the materialization
> > > method provided to the join store.
> > > - We extend the API to allow a user to fine tune different
> > materialization
> > > methods for subscription and join store.
> > >
> > > What do you think?
> > >
> > > Cheers,
> > > Marco
> > >
> > > On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> > >
> > > > Thanks Marco,
> > > >
> > > > Just a quick thought: what if we reuse the existing Materialized
> > object for
> > > > both subscription and join stores, instead of introducing a new
> param /
> > > > class?
> > > >
> > > > Guozhang
> > > >
> > > > On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> > contact@marcolotz.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I would like to invite everyone to discuss further KIP-718:
> > > > >
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> > > > >
> > > > > I welcome all feedback on it.
> > > > >
> > > > > Kind Regards,
> > > > > Marco Lotz
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Marco / John,

I think the arguments for not piggy-backing on the existing Materialized
makes sense; on the other hand, if we go this route should we just use a
separate Materialized than using an extended /
narrowed-scoped MaterializedSubscription since it seems we want to allow
users to fully customize this store?

Guozhang

On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vv...@apache.org> wrote:

> Thanks Marco,
>
> Sorry if I caused any trouble!
>
> I don’t remember what I was thinking before, but reasoning about it now,
> you might need the fine-grained choice if:
>
> 1. The number or size of records in each partition of both tables is
> small(ish), but the cardinality of the join is very high. Then you might
> want an in-memory table store, but an on-disk subscription store.
>
> 2. The number or size of records is very large, but the join cardinality
> is low. Then you might need an on-disk table store, but an in-memory
> subscription store.
>
> 3. You might want a different kind (or differently configured) store for
> the subscription store, since it’s access pattern is so different.
>
> If you buy these, it might be good to put the justification into the KIP.
>
> I’m in favor of the default you’ve proposed.
>
> Thanks,
> John
>
> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> > Hi Guozhang,
> >
> > Apologies for the late answer. Originally that was my proposal - to
> > piggyback on the provided materialisation method (
> > https://issues.apache.org/jira/browse/KAFKA-10383).
> > John Roesler suggested to us to provide even further fine tuning on API
> > level parameters. Maybe we could see this as two sides of the same coin:
> >
> > - On the current API, we change it to piggy back on the materialization
> > method provided to the join store.
> > - We extend the API to allow a user to fine tune different
> materialization
> > methods for subscription and join store.
> >
> > What do you think?
> >
> > Cheers,
> > Marco
> >
> > On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com> wrote:
> >
> > > Thanks Marco,
> > >
> > > Just a quick thought: what if we reuse the existing Materialized
> object for
> > > both subscription and join stores, instead of introducing a new param /
> > > class?
> > >
> > > Guozhang
> > >
> > > On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> contact@marcolotz.com>
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I would like to invite everyone to discuss further KIP-718:
> > > >
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> > > >
> > > > I welcome all feedback on it.
> > > >
> > > > Kind Regards,
> > > > Marco Lotz
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by John Roesler <vv...@apache.org>.
Thanks Marco,

Sorry if I caused any trouble!

I don’t remember what I was thinking before, but reasoning about it now, you might need the fine-grained choice if:

1. The number or size of records in each partition of both tables is small(ish), but the cardinality of the join is very high. Then you might want an in-memory table store, but an on-disk subscription store. 

2. The number or size of records is very large, but the join cardinality is low. Then you might need an on-disk table store, but an in-memory subscription store. 

3. You might want a different kind (or differently configured) store for the subscription store, since it’s access pattern is so different. 

If you buy these, it might be good to put the justification into the KIP. 

I’m in favor of the default you’ve proposed. 

Thanks,
John

On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> Hi Guozhang,
> 
> Apologies for the late answer. Originally that was my proposal - to
> piggyback on the provided materialisation method (
> https://issues.apache.org/jira/browse/KAFKA-10383).
> John Roesler suggested to us to provide even further fine tuning on API
> level parameters. Maybe we could see this as two sides of the same coin:
> 
> - On the current API, we change it to piggy back on the materialization
> method provided to the join store.
> - We extend the API to allow a user to fine tune different materialization
> methods for subscription and join store.
> 
> What do you think?
> 
> Cheers,
> Marco
> 
> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com> wrote:
> 
> > Thanks Marco,
> >
> > Just a quick thought: what if we reuse the existing Materialized object for
> > both subscription and join stores, instead of introducing a new param /
> > class?
> >
> > Guozhang
> >
> > On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <co...@marcolotz.com>
> > wrote:
> >
> > > Hi folks,
> > >
> > > I would like to invite everyone to discuss further KIP-718:
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> > >
> > > I welcome all feedback on it.
> > >
> > > Kind Regards,
> > > Marco Lotz
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Marco Aurélio Lotz <ma...@marcolotz.com>.
Hi Guozhang,

Apologies for the late answer. Originally that was my proposal - to
piggyback on the provided materialisation method (
https://issues.apache.org/jira/browse/KAFKA-10383).
John Roesler suggested to us to provide even further fine tuning on API
level parameters. Maybe we could see this as two sides of the same coin:

- On the current API, we change it to piggy back on the materialization
method provided to the join store.
- We extend the API to allow a user to fine tune different materialization
methods for subscription and join store.

What do you think?

Cheers,
Marco

On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wa...@gmail.com> wrote:

> Thanks Marco,
>
> Just a quick thought: what if we reuse the existing Materialized object for
> both subscription and join stores, instead of introducing a new param /
> class?
>
> Guozhang
>
> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <co...@marcolotz.com>
> wrote:
>
> > Hi folks,
> >
> > I would like to invite everyone to discuss further KIP-718:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >
> > I welcome all feedback on it.
> >
> > Kind Regards,
> > Marco Lotz
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Marco,

Just a quick thought: what if we reuse the existing Materialized object for
both subscription and join stores, instead of introducing a new param /
class?

Guozhang

On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <co...@marcolotz.com>
wrote:

> Hi folks,
>
> I would like to invite everyone to discuss further KIP-718:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
>
> I welcome all feedback on it.
>
> Kind Regards,
> Marco Lotz
>


-- 
-- Guozhang