You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Bill Bejeck <bb...@gmail.com> on 2018/05/07 17:26:25 UTC

[DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

All,
I'd like to start a discussion about adding a configuration parameter
allowing for the forthcoming topology optimization to be optional via
configuration.


The KIP can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+Configuration+Allowing+for+Optional+Topology+Optimization

Thanks,
Bill

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Bill Bejeck <bb...@gmail.com>.
At this point, I think the overall discussion on the new config is
completed and I'll start a voting thread.

Thanks,
Bill



On Tue, May 15, 2018 at 4:37 PM, Bill Bejeck <bb...@gmail.com> wrote:

> Thanks for the comments, John.  I've updated the config name on the KIP..
>
> -Bill
>
>
> On Tue, May 15, 2018 at 2:22 PM, John Roesler <jo...@confluent.io> wrote:
>
>> Hi Bill,
>>
>> Thanks for the KIP. Now that we're using strings describing the "set of
>> optimizations", such as "none" and "all", should we change the config name
>> to just "topology.optimizations"?
>>
>> The "enable." feels like a holdover from the boolean-valued config.
>>
>> Thanks,
>> -John
>>
>> On Tue, May 8, 2018 at 9:13 PM, Bill Bejeck <bb...@gmail.com> wrote:
>>
>> > Thanks, Guozhang and Matthias for the comments.
>> >
>> > I was thinking of changing the config type back to a String and
>> enforcing
>> > the values to be "true" or "false", but "none" or "all" is just as good.
>> >
>> > Since those values seem to work, I'll update the KIP accordingly.
>> >
>> > Thanks,
>> > Bill
>> >
>> > On Tue, May 8, 2018 at 9:38 PM, Matthias J. Sax <ma...@confluent.io>
>> > wrote:
>> >
>> > > Sounds good to me.
>> > >
>> > > On 5/8/18 5:47 PM, Guozhang Wang wrote:
>> > > > Thanks Matthias.
>> > > >
>> > > > I was also thinking about whether in the future we'd want to enable
>> > > > optimizations at different levels that may or may not impact
>> > > compatibility.
>> > > > That's why I asked if we have thought about "allowing part of the
>> > > > optimizations in the future".
>> > > >
>> > > > With that in mind, I'd change my preference and take string typed
>> > config.
>> > > > Even if we ended up with no finer grained optimizations in the
>> future
>> > we
>> > > > can still have the string typed parameter with only two allowed
>> values,
>> > > > like what we did for EOS. But I think in 2.0 allowing any not-null
>> > string
>> > > > values as enabled is still a bit odd, so how about we make two
>> string
>> > > > values, like `none` (default value) and `all`?
>> > > >
>> > > >
>> > > > Guozhang
>> > > >
>> > > >
>> > > > On Tue, May 8, 2018 at 3:35 PM, Matthias J. Sax <
>> matthias@confluent.io
>> > >
>> > > > wrote:
>> > > >
>> > > >> One thought I want to bring up about switching optimization on/off:
>> > > >>
>> > > >> While for the initial release, a boolean flag seems to be
>> sufficient,
>> > I
>> > > >> could imagine that we apply different and potentially
>> > > >> upgrade-incompatible optimizations in future releases. Thus, to me
>> it
>> > > >> would make sense to use a String type, to indicate what
>> optimizations
>> > > >> are possible based on the release. For example, in next release we
>> > > >> accept `null` for disabled and "2.0". If there are any incompatible
>> > > >> changes, people can stay with "2.0" optimizations level when
>> upgrading
>> > > >> to "2.1" while new applications can use "2.1" optimization level.
>> Old
>> > > >> applications would need to do an offline upgrade to get "2.1"
>> > > >> optimizations.
>> > > >>
>> > > >> I agree with Bill, that switching individual optimizations on/off
>> is
>> > too
>> > > >> fine grained and hard to maintain. However, for compatibility, it
>> > might
>> > > >> make sense, to have certain "levels of optimizations" (based on the
>> > > >> release) that allow users to stay with on an older level for
>> upgrade
>> > > >> purpose. Using the release numbers to encode those "levels" is
>> easy to
>> > > >> understand for users and should minimize the mental effort to get
>> the
>> > > >> config right.
>> > > >>
>> > > >> Let me know what you think about this.
>> > > >>
>> > > >>
>> > > >> -Matthias
>> > > >>
>> > > >> On 5/8/18 3:08 PM, Ted Yu wrote:
>> > > >>> Bill:That makes sense.
>> > > >>> Using boolean should suffice.
>> > > >>> -------- Original message --------From: Bill Bejeck <
>> > bbejeck@gmail.com
>> > > >
>> > > >> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org
>> Subject:
>> > > Re:
>> > > >> [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional
>> > > Topology
>> > > >> Optimization
>> > > >>> Thanks for the comments Guozhang and Ted.
>> > > >>>
>> > > >>> Guozhang:
>> > > >>>
>> > > >>>   1) I'll update the KIP in the "Compatibility, Deprecation and
>> > > Migration
>> > > >>> Plan" with the expected impact of turning on optimization. But at
>> > this
>> > > >>> point, I have not identified a migration plan that doesn't involve
>> > > having
>> > > >>> to stop all instances and restart.
>> > > >>>
>> > > >>>   2) Setting the type to String was just so we could have the
>> default
>> > > of
>> > > >>> null, indicating run no optimizations. As for partially enabling
>> > > >>> optimizations, I'm not sure I had that in mind, at least at this
>> > point.
>> > > >>>  To me having the topology optimized should be an "all or nothing"
>> > > >>> proposition.  For now, I'll change the type to boolean (with a
>> > default
>> > > >>> value of false) to better reflect the intent of the configuration.
>> > > >>>
>> > > >>> Ted, thanks again for the comments.
>> > > >>>
>> > > >>> The intent of the new configuration, as I mentioned above, is
>> whether
>> > > to
>> > > >>> turn optimization on or off in aggregate.  The main reason for
>> having
>> > > the
>> > > >>> configuration is for backward compatibility.  As optimization may
>> > > result
>> > > >> in
>> > > >>> a slightly different topology from the original one, we need to
>> allow
>> > > >> users
>> > > >>> to turn it off until they are ready for migrating to the new
>> > topology.
>> > > >>>
>> > > >>> I don't think having to select each optimization is a feasible
>> > > >> solution.  I
>> > > >>> say this for few reasons:
>> > > >>>
>> > > >>>    1. Maintenance will be an issue.  While the initial target is
>> > only a
>> > > >>>    small number of optimizations, but as the number grows, having
>> to
>> > > >> maintain
>> > > >>>    the number of options in the code will difficult at best.
>> > > >>>    2. Users probably don't want to reason about which combination
>> of
>> > > >>>    optimizations to have.  Thinking about which optimizations to
>> turn
>> > > on
>> > > >> or
>> > > >>>    off raises the complexity of deploying an application.
>> > > >>>    3. When using a relational database or other another framework
>> > that
>> > > >> may
>> > > >>>    optimize queries, as far as I know, it's not a choice of which
>> > > >>>    optimizations to have, they are performed automatically.
>> > > >>>
>> > > >>> Does this make sense?
>> > > >>>
>> > > >>>
>> > > >>> On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > >>>
>> > > >>>> There are 4 subtasks for KAFKA-6034
>> > > >>>>
>> > > >>>> If each optimization can be switched on/off, there should be 4
>> enums
>> > > for
>> > > >>>> the switch.
>> > > >>>>
>> > > >>>> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <
>> wangguoz@gmail.com>
>> > > >> wrote:
>> > > >>>>
>> > > >>>>> Hello Bill,
>> > > >>>>>
>> > > >>>>> Thanks for the KIP. My comments are the following:
>> > > >>>>>
>> > > >>>>> 1) We should state clearly what are the expected impact in
>> > > >>>> "Compatibility,
>> > > >>>>> Deprecation, and Migration Plan" when optimization is turned on.
>> > > >>>>>
>> > > >>>>> 2) I'm wondering why we define this config as a String typed,
>> > rather
>> > > >> than
>> > > >>>>> boolean typed? Or are you expecting to extend it to allow
>> partially
>> > > >>>>> enabling part of the optimizations in the future?
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> Hi Ted,
>> > > >>>>>
>> > > >>>>> The cover JIRA for topology optimization can be found here:
>> > > >>>>> https://issues.apache.org/jira/browse/KAFKA-6034
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> Guozhang
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com>
>> > wrote:
>> > > >>>>>
>> > > >>>>>> Which JIRA is for the Topology Optimization itself ?
>> > > >>>>>>
>> > > >>>>>> Thanks
>> > > >>>>>>
>> > > >>>>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <
>> bbejeck@gmail.com>
>> > > >>>> wrote:
>> > > >>>>>>
>> > > >>>>>>> All,
>> > > >>>>>>> I'd like to start a discussion about adding a configuration
>> > > parameter
>> > > >>>>>>> allowing for the forthcoming topology optimization to be
>> optional
>> > > via
>> > > >>>>>>> configuration.
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> The KIP can be found here:
>> > > >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > >>>>>> 295%3A+Add+Streams+
>> > > >>>>>>> Configuration+Allowing+for+Optional+Topology+Optimization
>> > > >>>>>>>
>> > > >>>>>>> Thanks,
>> > > >>>>>>> Bill
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> --
>> > > >>>>> -- Guozhang
>> > > >>>>>
>> > > >>>>
>> > > >>
>> > > >>
>> > > >
>> > > >
>> > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Bill Bejeck <bb...@gmail.com>.
Thanks for the comments, John.  I've updated the config name on the KIP..

-Bill


On Tue, May 15, 2018 at 2:22 PM, John Roesler <jo...@confluent.io> wrote:

> Hi Bill,
>
> Thanks for the KIP. Now that we're using strings describing the "set of
> optimizations", such as "none" and "all", should we change the config name
> to just "topology.optimizations"?
>
> The "enable." feels like a holdover from the boolean-valued config.
>
> Thanks,
> -John
>
> On Tue, May 8, 2018 at 9:13 PM, Bill Bejeck <bb...@gmail.com> wrote:
>
> > Thanks, Guozhang and Matthias for the comments.
> >
> > I was thinking of changing the config type back to a String and enforcing
> > the values to be "true" or "false", but "none" or "all" is just as good.
> >
> > Since those values seem to work, I'll update the KIP accordingly.
> >
> > Thanks,
> > Bill
> >
> > On Tue, May 8, 2018 at 9:38 PM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> > > Sounds good to me.
> > >
> > > On 5/8/18 5:47 PM, Guozhang Wang wrote:
> > > > Thanks Matthias.
> > > >
> > > > I was also thinking about whether in the future we'd want to enable
> > > > optimizations at different levels that may or may not impact
> > > compatibility.
> > > > That's why I asked if we have thought about "allowing part of the
> > > > optimizations in the future".
> > > >
> > > > With that in mind, I'd change my preference and take string typed
> > config.
> > > > Even if we ended up with no finer grained optimizations in the future
> > we
> > > > can still have the string typed parameter with only two allowed
> values,
> > > > like what we did for EOS. But I think in 2.0 allowing any not-null
> > string
> > > > values as enabled is still a bit odd, so how about we make two string
> > > > values, like `none` (default value) and `all`?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Tue, May 8, 2018 at 3:35 PM, Matthias J. Sax <
> matthias@confluent.io
> > >
> > > > wrote:
> > > >
> > > >> One thought I want to bring up about switching optimization on/off:
> > > >>
> > > >> While for the initial release, a boolean flag seems to be
> sufficient,
> > I
> > > >> could imagine that we apply different and potentially
> > > >> upgrade-incompatible optimizations in future releases. Thus, to me
> it
> > > >> would make sense to use a String type, to indicate what
> optimizations
> > > >> are possible based on the release. For example, in next release we
> > > >> accept `null` for disabled and "2.0". If there are any incompatible
> > > >> changes, people can stay with "2.0" optimizations level when
> upgrading
> > > >> to "2.1" while new applications can use "2.1" optimization level.
> Old
> > > >> applications would need to do an offline upgrade to get "2.1"
> > > >> optimizations.
> > > >>
> > > >> I agree with Bill, that switching individual optimizations on/off is
> > too
> > > >> fine grained and hard to maintain. However, for compatibility, it
> > might
> > > >> make sense, to have certain "levels of optimizations" (based on the
> > > >> release) that allow users to stay with on an older level for upgrade
> > > >> purpose. Using the release numbers to encode those "levels" is easy
> to
> > > >> understand for users and should minimize the mental effort to get
> the
> > > >> config right.
> > > >>
> > > >> Let me know what you think about this.
> > > >>
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 5/8/18 3:08 PM, Ted Yu wrote:
> > > >>> Bill:That makes sense.
> > > >>> Using boolean should suffice.
> > > >>> -------- Original message --------From: Bill Bejeck <
> > bbejeck@gmail.com
> > > >
> > > >> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org
> Subject:
> > > Re:
> > > >> [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional
> > > Topology
> > > >> Optimization
> > > >>> Thanks for the comments Guozhang and Ted.
> > > >>>
> > > >>> Guozhang:
> > > >>>
> > > >>>   1) I'll update the KIP in the "Compatibility, Deprecation and
> > > Migration
> > > >>> Plan" with the expected impact of turning on optimization. But at
> > this
> > > >>> point, I have not identified a migration plan that doesn't involve
> > > having
> > > >>> to stop all instances and restart.
> > > >>>
> > > >>>   2) Setting the type to String was just so we could have the
> default
> > > of
> > > >>> null, indicating run no optimizations. As for partially enabling
> > > >>> optimizations, I'm not sure I had that in mind, at least at this
> > point.
> > > >>>  To me having the topology optimized should be an "all or nothing"
> > > >>> proposition.  For now, I'll change the type to boolean (with a
> > default
> > > >>> value of false) to better reflect the intent of the configuration.
> > > >>>
> > > >>> Ted, thanks again for the comments.
> > > >>>
> > > >>> The intent of the new configuration, as I mentioned above, is
> whether
> > > to
> > > >>> turn optimization on or off in aggregate.  The main reason for
> having
> > > the
> > > >>> configuration is for backward compatibility.  As optimization may
> > > result
> > > >> in
> > > >>> a slightly different topology from the original one, we need to
> allow
> > > >> users
> > > >>> to turn it off until they are ready for migrating to the new
> > topology.
> > > >>>
> > > >>> I don't think having to select each optimization is a feasible
> > > >> solution.  I
> > > >>> say this for few reasons:
> > > >>>
> > > >>>    1. Maintenance will be an issue.  While the initial target is
> > only a
> > > >>>    small number of optimizations, but as the number grows, having
> to
> > > >> maintain
> > > >>>    the number of options in the code will difficult at best.
> > > >>>    2. Users probably don't want to reason about which combination
> of
> > > >>>    optimizations to have.  Thinking about which optimizations to
> turn
> > > on
> > > >> or
> > > >>>    off raises the complexity of deploying an application.
> > > >>>    3. When using a relational database or other another framework
> > that
> > > >> may
> > > >>>    optimize queries, as far as I know, it's not a choice of which
> > > >>>    optimizations to have, they are performed automatically.
> > > >>>
> > > >>> Does this make sense?
> > > >>>
> > > >>>
> > > >>> On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >>>
> > > >>>> There are 4 subtasks for KAFKA-6034
> > > >>>>
> > > >>>> If each optimization can be switched on/off, there should be 4
> enums
> > > for
> > > >>>> the switch.
> > > >>>>
> > > >>>> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wangguoz@gmail.com
> >
> > > >> wrote:
> > > >>>>
> > > >>>>> Hello Bill,
> > > >>>>>
> > > >>>>> Thanks for the KIP. My comments are the following:
> > > >>>>>
> > > >>>>> 1) We should state clearly what are the expected impact in
> > > >>>> "Compatibility,
> > > >>>>> Deprecation, and Migration Plan" when optimization is turned on.
> > > >>>>>
> > > >>>>> 2) I'm wondering why we define this config as a String typed,
> > rather
> > > >> than
> > > >>>>> boolean typed? Or are you expecting to extend it to allow
> partially
> > > >>>>> enabling part of the optimizations in the future?
> > > >>>>>
> > > >>>>>
> > > >>>>> Hi Ted,
> > > >>>>>
> > > >>>>> The cover JIRA for topology optimization can be found here:
> > > >>>>> https://issues.apache.org/jira/browse/KAFKA-6034
> > > >>>>>
> > > >>>>>
> > > >>>>> Guozhang
> > > >>>>>
> > > >>>>>
> > > >>>>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>>> Which JIRA is for the Topology Optimization itself ?
> > > >>>>>>
> > > >>>>>> Thanks
> > > >>>>>>
> > > >>>>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bbejeck@gmail.com
> >
> > > >>>> wrote:
> > > >>>>>>
> > > >>>>>>> All,
> > > >>>>>>> I'd like to start a discussion about adding a configuration
> > > parameter
> > > >>>>>>> allowing for the forthcoming topology optimization to be
> optional
> > > via
> > > >>>>>>> configuration.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> The KIP can be found here:
> > > >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >>>>>> 295%3A+Add+Streams+
> > > >>>>>>> Configuration+Allowing+for+Optional+Topology+Optimization
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Bill
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> -- Guozhang
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by John Roesler <jo...@confluent.io>.
Hi Bill,

Thanks for the KIP. Now that we're using strings describing the "set of
optimizations", such as "none" and "all", should we change the config name
to just "topology.optimizations"?

The "enable." feels like a holdover from the boolean-valued config.

Thanks,
-John

On Tue, May 8, 2018 at 9:13 PM, Bill Bejeck <bb...@gmail.com> wrote:

> Thanks, Guozhang and Matthias for the comments.
>
> I was thinking of changing the config type back to a String and enforcing
> the values to be "true" or "false", but "none" or "all" is just as good.
>
> Since those values seem to work, I'll update the KIP accordingly.
>
> Thanks,
> Bill
>
> On Tue, May 8, 2018 at 9:38 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Sounds good to me.
> >
> > On 5/8/18 5:47 PM, Guozhang Wang wrote:
> > > Thanks Matthias.
> > >
> > > I was also thinking about whether in the future we'd want to enable
> > > optimizations at different levels that may or may not impact
> > compatibility.
> > > That's why I asked if we have thought about "allowing part of the
> > > optimizations in the future".
> > >
> > > With that in mind, I'd change my preference and take string typed
> config.
> > > Even if we ended up with no finer grained optimizations in the future
> we
> > > can still have the string typed parameter with only two allowed values,
> > > like what we did for EOS. But I think in 2.0 allowing any not-null
> string
> > > values as enabled is still a bit odd, so how about we make two string
> > > values, like `none` (default value) and `all`?
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Tue, May 8, 2018 at 3:35 PM, Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > >> One thought I want to bring up about switching optimization on/off:
> > >>
> > >> While for the initial release, a boolean flag seems to be sufficient,
> I
> > >> could imagine that we apply different and potentially
> > >> upgrade-incompatible optimizations in future releases. Thus, to me it
> > >> would make sense to use a String type, to indicate what optimizations
> > >> are possible based on the release. For example, in next release we
> > >> accept `null` for disabled and "2.0". If there are any incompatible
> > >> changes, people can stay with "2.0" optimizations level when upgrading
> > >> to "2.1" while new applications can use "2.1" optimization level. Old
> > >> applications would need to do an offline upgrade to get "2.1"
> > >> optimizations.
> > >>
> > >> I agree with Bill, that switching individual optimizations on/off is
> too
> > >> fine grained and hard to maintain. However, for compatibility, it
> might
> > >> make sense, to have certain "levels of optimizations" (based on the
> > >> release) that allow users to stay with on an older level for upgrade
> > >> purpose. Using the release numbers to encode those "levels" is easy to
> > >> understand for users and should minimize the mental effort to get the
> > >> config right.
> > >>
> > >> Let me know what you think about this.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 5/8/18 3:08 PM, Ted Yu wrote:
> > >>> Bill:That makes sense.
> > >>> Using boolean should suffice.
> > >>> -------- Original message --------From: Bill Bejeck <
> bbejeck@gmail.com
> > >
> > >> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org Subject:
> > Re:
> > >> [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional
> > Topology
> > >> Optimization
> > >>> Thanks for the comments Guozhang and Ted.
> > >>>
> > >>> Guozhang:
> > >>>
> > >>>   1) I'll update the KIP in the "Compatibility, Deprecation and
> > Migration
> > >>> Plan" with the expected impact of turning on optimization. But at
> this
> > >>> point, I have not identified a migration plan that doesn't involve
> > having
> > >>> to stop all instances and restart.
> > >>>
> > >>>   2) Setting the type to String was just so we could have the default
> > of
> > >>> null, indicating run no optimizations. As for partially enabling
> > >>> optimizations, I'm not sure I had that in mind, at least at this
> point.
> > >>>  To me having the topology optimized should be an "all or nothing"
> > >>> proposition.  For now, I'll change the type to boolean (with a
> default
> > >>> value of false) to better reflect the intent of the configuration.
> > >>>
> > >>> Ted, thanks again for the comments.
> > >>>
> > >>> The intent of the new configuration, as I mentioned above, is whether
> > to
> > >>> turn optimization on or off in aggregate.  The main reason for having
> > the
> > >>> configuration is for backward compatibility.  As optimization may
> > result
> > >> in
> > >>> a slightly different topology from the original one, we need to allow
> > >> users
> > >>> to turn it off until they are ready for migrating to the new
> topology.
> > >>>
> > >>> I don't think having to select each optimization is a feasible
> > >> solution.  I
> > >>> say this for few reasons:
> > >>>
> > >>>    1. Maintenance will be an issue.  While the initial target is
> only a
> > >>>    small number of optimizations, but as the number grows, having to
> > >> maintain
> > >>>    the number of options in the code will difficult at best.
> > >>>    2. Users probably don't want to reason about which combination of
> > >>>    optimizations to have.  Thinking about which optimizations to turn
> > on
> > >> or
> > >>>    off raises the complexity of deploying an application.
> > >>>    3. When using a relational database or other another framework
> that
> > >> may
> > >>>    optimize queries, as far as I know, it's not a choice of which
> > >>>    optimizations to have, they are performed automatically.
> > >>>
> > >>> Does this make sense?
> > >>>
> > >>>
> > >>> On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:
> > >>>
> > >>>> There are 4 subtasks for KAFKA-6034
> > >>>>
> > >>>> If each optimization can be switched on/off, there should be 4 enums
> > for
> > >>>> the switch.
> > >>>>
> > >>>> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com>
> > >> wrote:
> > >>>>
> > >>>>> Hello Bill,
> > >>>>>
> > >>>>> Thanks for the KIP. My comments are the following:
> > >>>>>
> > >>>>> 1) We should state clearly what are the expected impact in
> > >>>> "Compatibility,
> > >>>>> Deprecation, and Migration Plan" when optimization is turned on.
> > >>>>>
> > >>>>> 2) I'm wondering why we define this config as a String typed,
> rather
> > >> than
> > >>>>> boolean typed? Or are you expecting to extend it to allow partially
> > >>>>> enabling part of the optimizations in the future?
> > >>>>>
> > >>>>>
> > >>>>> Hi Ted,
> > >>>>>
> > >>>>> The cover JIRA for topology optimization can be found here:
> > >>>>> https://issues.apache.org/jira/browse/KAFKA-6034
> > >>>>>
> > >>>>>
> > >>>>> Guozhang
> > >>>>>
> > >>>>>
> > >>>>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> Which JIRA is for the Topology Optimization itself ?
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> All,
> > >>>>>>> I'd like to start a discussion about adding a configuration
> > parameter
> > >>>>>>> allowing for the forthcoming topology optimization to be optional
> > via
> > >>>>>>> configuration.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> The KIP can be found here:
> > >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>> 295%3A+Add+Streams+
> > >>>>>>> Configuration+Allowing+for+Optional+Topology+Optimization
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Bill
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> -- Guozhang
> > >>>>>
> > >>>>
> > >>
> > >>
> > >
> > >
> >
> >
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Bill Bejeck <bb...@gmail.com>.
Thanks, Guozhang and Matthias for the comments.

I was thinking of changing the config type back to a String and enforcing
the values to be "true" or "false", but "none" or "all" is just as good.

Since those values seem to work, I'll update the KIP accordingly.

Thanks,
Bill

On Tue, May 8, 2018 at 9:38 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Sounds good to me.
>
> On 5/8/18 5:47 PM, Guozhang Wang wrote:
> > Thanks Matthias.
> >
> > I was also thinking about whether in the future we'd want to enable
> > optimizations at different levels that may or may not impact
> compatibility.
> > That's why I asked if we have thought about "allowing part of the
> > optimizations in the future".
> >
> > With that in mind, I'd change my preference and take string typed config.
> > Even if we ended up with no finer grained optimizations in the future we
> > can still have the string typed parameter with only two allowed values,
> > like what we did for EOS. But I think in 2.0 allowing any not-null string
> > values as enabled is still a bit odd, so how about we make two string
> > values, like `none` (default value) and `all`?
> >
> >
> > Guozhang
> >
> >
> > On Tue, May 8, 2018 at 3:35 PM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> One thought I want to bring up about switching optimization on/off:
> >>
> >> While for the initial release, a boolean flag seems to be sufficient, I
> >> could imagine that we apply different and potentially
> >> upgrade-incompatible optimizations in future releases. Thus, to me it
> >> would make sense to use a String type, to indicate what optimizations
> >> are possible based on the release. For example, in next release we
> >> accept `null` for disabled and "2.0". If there are any incompatible
> >> changes, people can stay with "2.0" optimizations level when upgrading
> >> to "2.1" while new applications can use "2.1" optimization level. Old
> >> applications would need to do an offline upgrade to get "2.1"
> >> optimizations.
> >>
> >> I agree with Bill, that switching individual optimizations on/off is too
> >> fine grained and hard to maintain. However, for compatibility, it might
> >> make sense, to have certain "levels of optimizations" (based on the
> >> release) that allow users to stay with on an older level for upgrade
> >> purpose. Using the release numbers to encode those "levels" is easy to
> >> understand for users and should minimize the mental effort to get the
> >> config right.
> >>
> >> Let me know what you think about this.
> >>
> >>
> >> -Matthias
> >>
> >> On 5/8/18 3:08 PM, Ted Yu wrote:
> >>> Bill:That makes sense.
> >>> Using boolean should suffice.
> >>> -------- Original message --------From: Bill Bejeck <bbejeck@gmail.com
> >
> >> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org Subject:
> Re:
> >> [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional
> Topology
> >> Optimization
> >>> Thanks for the comments Guozhang and Ted.
> >>>
> >>> Guozhang:
> >>>
> >>>   1) I'll update the KIP in the "Compatibility, Deprecation and
> Migration
> >>> Plan" with the expected impact of turning on optimization. But at this
> >>> point, I have not identified a migration plan that doesn't involve
> having
> >>> to stop all instances and restart.
> >>>
> >>>   2) Setting the type to String was just so we could have the default
> of
> >>> null, indicating run no optimizations. As for partially enabling
> >>> optimizations, I'm not sure I had that in mind, at least at this point.
> >>>  To me having the topology optimized should be an "all or nothing"
> >>> proposition.  For now, I'll change the type to boolean (with a default
> >>> value of false) to better reflect the intent of the configuration.
> >>>
> >>> Ted, thanks again for the comments.
> >>>
> >>> The intent of the new configuration, as I mentioned above, is whether
> to
> >>> turn optimization on or off in aggregate.  The main reason for having
> the
> >>> configuration is for backward compatibility.  As optimization may
> result
> >> in
> >>> a slightly different topology from the original one, we need to allow
> >> users
> >>> to turn it off until they are ready for migrating to the new topology.
> >>>
> >>> I don't think having to select each optimization is a feasible
> >> solution.  I
> >>> say this for few reasons:
> >>>
> >>>    1. Maintenance will be an issue.  While the initial target is only a
> >>>    small number of optimizations, but as the number grows, having to
> >> maintain
> >>>    the number of options in the code will difficult at best.
> >>>    2. Users probably don't want to reason about which combination of
> >>>    optimizations to have.  Thinking about which optimizations to turn
> on
> >> or
> >>>    off raises the complexity of deploying an application.
> >>>    3. When using a relational database or other another framework that
> >> may
> >>>    optimize queries, as far as I know, it's not a choice of which
> >>>    optimizations to have, they are performed automatically.
> >>>
> >>> Does this make sense?
> >>>
> >>>
> >>> On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:
> >>>
> >>>> There are 4 subtasks for KAFKA-6034
> >>>>
> >>>> If each optimization can be switched on/off, there should be 4 enums
> for
> >>>> the switch.
> >>>>
> >>>> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Hello Bill,
> >>>>>
> >>>>> Thanks for the KIP. My comments are the following:
> >>>>>
> >>>>> 1) We should state clearly what are the expected impact in
> >>>> "Compatibility,
> >>>>> Deprecation, and Migration Plan" when optimization is turned on.
> >>>>>
> >>>>> 2) I'm wondering why we define this config as a String typed, rather
> >> than
> >>>>> boolean typed? Or are you expecting to extend it to allow partially
> >>>>> enabling part of the optimizations in the future?
> >>>>>
> >>>>>
> >>>>> Hi Ted,
> >>>>>
> >>>>> The cover JIRA for topology optimization can be found here:
> >>>>> https://issues.apache.org/jira/browse/KAFKA-6034
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>>>
> >>>>>> Which JIRA is for the Topology Optimization itself ?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> All,
> >>>>>>> I'd like to start a discussion about adding a configuration
> parameter
> >>>>>>> allowing for the forthcoming topology optimization to be optional
> via
> >>>>>>> configuration.
> >>>>>>>
> >>>>>>>
> >>>>>>> The KIP can be found here:
> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>> 295%3A+Add+Streams+
> >>>>>>> Configuration+Allowing+for+Optional+Topology+Optimization
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Bill
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>>
> >>>>
> >>
> >>
> >
> >
>
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Sounds good to me.

On 5/8/18 5:47 PM, Guozhang Wang wrote:
> Thanks Matthias.
> 
> I was also thinking about whether in the future we'd want to enable
> optimizations at different levels that may or may not impact compatibility.
> That's why I asked if we have thought about "allowing part of the
> optimizations in the future".
> 
> With that in mind, I'd change my preference and take string typed config.
> Even if we ended up with no finer grained optimizations in the future we
> can still have the string typed parameter with only two allowed values,
> like what we did for EOS. But I think in 2.0 allowing any not-null string
> values as enabled is still a bit odd, so how about we make two string
> values, like `none` (default value) and `all`?
> 
> 
> Guozhang
> 
> 
> On Tue, May 8, 2018 at 3:35 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> One thought I want to bring up about switching optimization on/off:
>>
>> While for the initial release, a boolean flag seems to be sufficient, I
>> could imagine that we apply different and potentially
>> upgrade-incompatible optimizations in future releases. Thus, to me it
>> would make sense to use a String type, to indicate what optimizations
>> are possible based on the release. For example, in next release we
>> accept `null` for disabled and "2.0". If there are any incompatible
>> changes, people can stay with "2.0" optimizations level when upgrading
>> to "2.1" while new applications can use "2.1" optimization level. Old
>> applications would need to do an offline upgrade to get "2.1"
>> optimizations.
>>
>> I agree with Bill, that switching individual optimizations on/off is too
>> fine grained and hard to maintain. However, for compatibility, it might
>> make sense, to have certain "levels of optimizations" (based on the
>> release) that allow users to stay with on an older level for upgrade
>> purpose. Using the release numbers to encode those "levels" is easy to
>> understand for users and should minimize the mental effort to get the
>> config right.
>>
>> Let me know what you think about this.
>>
>>
>> -Matthias
>>
>> On 5/8/18 3:08 PM, Ted Yu wrote:
>>> Bill:That makes sense.
>>> Using boolean should suffice.
>>> -------- Original message --------From: Bill Bejeck <bb...@gmail.com>
>> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org Subject: Re:
>> [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology
>> Optimization
>>> Thanks for the comments Guozhang and Ted.
>>>
>>> Guozhang:
>>>
>>>   1) I'll update the KIP in the "Compatibility, Deprecation and Migration
>>> Plan" with the expected impact of turning on optimization. But at this
>>> point, I have not identified a migration plan that doesn't involve having
>>> to stop all instances and restart.
>>>
>>>   2) Setting the type to String was just so we could have the default of
>>> null, indicating run no optimizations. As for partially enabling
>>> optimizations, I'm not sure I had that in mind, at least at this point.
>>>  To me having the topology optimized should be an "all or nothing"
>>> proposition.  For now, I'll change the type to boolean (with a default
>>> value of false) to better reflect the intent of the configuration.
>>>
>>> Ted, thanks again for the comments.
>>>
>>> The intent of the new configuration, as I mentioned above, is whether to
>>> turn optimization on or off in aggregate.  The main reason for having the
>>> configuration is for backward compatibility.  As optimization may result
>> in
>>> a slightly different topology from the original one, we need to allow
>> users
>>> to turn it off until they are ready for migrating to the new topology.
>>>
>>> I don't think having to select each optimization is a feasible
>> solution.  I
>>> say this for few reasons:
>>>
>>>    1. Maintenance will be an issue.  While the initial target is only a
>>>    small number of optimizations, but as the number grows, having to
>> maintain
>>>    the number of options in the code will difficult at best.
>>>    2. Users probably don't want to reason about which combination of
>>>    optimizations to have.  Thinking about which optimizations to turn on
>> or
>>>    off raises the complexity of deploying an application.
>>>    3. When using a relational database or other another framework that
>> may
>>>    optimize queries, as far as I know, it's not a choice of which
>>>    optimizations to have, they are performed automatically.
>>>
>>> Does this make sense?
>>>
>>>
>>> On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> There are 4 subtasks for KAFKA-6034
>>>>
>>>> If each optimization can be switched on/off, there should be 4 enums for
>>>> the switch.
>>>>
>>>> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com>
>> wrote:
>>>>
>>>>> Hello Bill,
>>>>>
>>>>> Thanks for the KIP. My comments are the following:
>>>>>
>>>>> 1) We should state clearly what are the expected impact in
>>>> "Compatibility,
>>>>> Deprecation, and Migration Plan" when optimization is turned on.
>>>>>
>>>>> 2) I'm wondering why we define this config as a String typed, rather
>> than
>>>>> boolean typed? Or are you expecting to extend it to allow partially
>>>>> enabling part of the optimizations in the future?
>>>>>
>>>>>
>>>>> Hi Ted,
>>>>>
>>>>> The cover JIRA for topology optimization can be found here:
>>>>> https://issues.apache.org/jira/browse/KAFKA-6034
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>>
>>>>>> Which JIRA is for the Topology Optimization itself ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>>> All,
>>>>>>> I'd like to start a discussion about adding a configuration parameter
>>>>>>> allowing for the forthcoming topology optimization to be optional via
>>>>>>> configuration.
>>>>>>>
>>>>>>>
>>>>>>> The KIP can be found here:
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>> 295%3A+Add+Streams+
>>>>>>> Configuration+Allowing+for+Optional+Topology+Optimization
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bill
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -- Guozhang
>>>>>
>>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Matthias.

I was also thinking about whether in the future we'd want to enable
optimizations at different levels that may or may not impact compatibility.
That's why I asked if we have thought about "allowing part of the
optimizations in the future".

With that in mind, I'd change my preference and take string typed config.
Even if we ended up with no finer grained optimizations in the future we
can still have the string typed parameter with only two allowed values,
like what we did for EOS. But I think in 2.0 allowing any not-null string
values as enabled is still a bit odd, so how about we make two string
values, like `none` (default value) and `all`?


Guozhang


On Tue, May 8, 2018 at 3:35 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> One thought I want to bring up about switching optimization on/off:
>
> While for the initial release, a boolean flag seems to be sufficient, I
> could imagine that we apply different and potentially
> upgrade-incompatible optimizations in future releases. Thus, to me it
> would make sense to use a String type, to indicate what optimizations
> are possible based on the release. For example, in next release we
> accept `null` for disabled and "2.0". If there are any incompatible
> changes, people can stay with "2.0" optimizations level when upgrading
> to "2.1" while new applications can use "2.1" optimization level. Old
> applications would need to do an offline upgrade to get "2.1"
> optimizations.
>
> I agree with Bill, that switching individual optimizations on/off is too
> fine grained and hard to maintain. However, for compatibility, it might
> make sense, to have certain "levels of optimizations" (based on the
> release) that allow users to stay with on an older level for upgrade
> purpose. Using the release numbers to encode those "levels" is easy to
> understand for users and should minimize the mental effort to get the
> config right.
>
> Let me know what you think about this.
>
>
> -Matthias
>
> On 5/8/18 3:08 PM, Ted Yu wrote:
> > Bill:That makes sense.
> > Using boolean should suffice.
> > -------- Original message --------From: Bill Bejeck <bb...@gmail.com>
> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org Subject: Re:
> [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology
> Optimization
> > Thanks for the comments Guozhang and Ted.
> >
> > Guozhang:
> >
> >   1) I'll update the KIP in the "Compatibility, Deprecation and Migration
> > Plan" with the expected impact of turning on optimization. But at this
> > point, I have not identified a migration plan that doesn't involve having
> > to stop all instances and restart.
> >
> >   2) Setting the type to String was just so we could have the default of
> > null, indicating run no optimizations. As for partially enabling
> > optimizations, I'm not sure I had that in mind, at least at this point.
> >  To me having the topology optimized should be an "all or nothing"
> > proposition.  For now, I'll change the type to boolean (with a default
> > value of false) to better reflect the intent of the configuration.
> >
> > Ted, thanks again for the comments.
> >
> > The intent of the new configuration, as I mentioned above, is whether to
> > turn optimization on or off in aggregate.  The main reason for having the
> > configuration is for backward compatibility.  As optimization may result
> in
> > a slightly different topology from the original one, we need to allow
> users
> > to turn it off until they are ready for migrating to the new topology.
> >
> > I don't think having to select each optimization is a feasible
> solution.  I
> > say this for few reasons:
> >
> >    1. Maintenance will be an issue.  While the initial target is only a
> >    small number of optimizations, but as the number grows, having to
> maintain
> >    the number of options in the code will difficult at best.
> >    2. Users probably don't want to reason about which combination of
> >    optimizations to have.  Thinking about which optimizations to turn on
> or
> >    off raises the complexity of deploying an application.
> >    3. When using a relational database or other another framework that
> may
> >    optimize queries, as far as I know, it's not a choice of which
> >    optimizations to have, they are performed automatically.
> >
> > Does this make sense?
> >
> >
> > On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> There are 4 subtasks for KAFKA-6034
> >>
> >> If each optimization can be switched on/off, there should be 4 enums for
> >> the switch.
> >>
> >> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >>
> >>> Hello Bill,
> >>>
> >>> Thanks for the KIP. My comments are the following:
> >>>
> >>> 1) We should state clearly what are the expected impact in
> >> "Compatibility,
> >>> Deprecation, and Migration Plan" when optimization is turned on.
> >>>
> >>> 2) I'm wondering why we define this config as a String typed, rather
> than
> >>> boolean typed? Or are you expecting to extend it to allow partially
> >>> enabling part of the optimizations in the future?
> >>>
> >>>
> >>> Hi Ted,
> >>>
> >>> The cover JIRA for topology optimization can be found here:
> >>> https://issues.apache.org/jira/browse/KAFKA-6034
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>
> >>>> Which JIRA is for the Topology Optimization itself ?
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
> >> wrote:
> >>>>
> >>>>> All,
> >>>>> I'd like to start a discussion about adding a configuration parameter
> >>>>> allowing for the forthcoming topology optimization to be optional via
> >>>>> configuration.
> >>>>>
> >>>>>
> >>>>> The KIP can be found here:
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 295%3A+Add+Streams+
> >>>>> Configuration+Allowing+for+Optional+Topology+Optimization
> >>>>>
> >>>>> Thanks,
> >>>>> Bill
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -- Guozhang
> >>>
> >>
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
One thought I want to bring up about switching optimization on/off:

While for the initial release, a boolean flag seems to be sufficient, I
could imagine that we apply different and potentially
upgrade-incompatible optimizations in future releases. Thus, to me it
would make sense to use a String type, to indicate what optimizations
are possible based on the release. For example, in next release we
accept `null` for disabled and "2.0". If there are any incompatible
changes, people can stay with "2.0" optimizations level when upgrading
to "2.1" while new applications can use "2.1" optimization level. Old
applications would need to do an offline upgrade to get "2.1" optimizations.

I agree with Bill, that switching individual optimizations on/off is too
fine grained and hard to maintain. However, for compatibility, it might
make sense, to have certain "levels of optimizations" (based on the
release) that allow users to stay with on an older level for upgrade
purpose. Using the release numbers to encode those "levels" is easy to
understand for users and should minimize the mental effort to get the
config right.

Let me know what you think about this.


-Matthias

On 5/8/18 3:08 PM, Ted Yu wrote:
> Bill:That makes sense.
> Using boolean should suffice.
> -------- Original message --------From: Bill Bejeck <bb...@gmail.com> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization 
> Thanks for the comments Guozhang and Ted.
> 
> Guozhang:
> 
>   1) I'll update the KIP in the "Compatibility, Deprecation and Migration
> Plan" with the expected impact of turning on optimization. But at this
> point, I have not identified a migration plan that doesn't involve having
> to stop all instances and restart.
> 
>   2) Setting the type to String was just so we could have the default of
> null, indicating run no optimizations. As for partially enabling
> optimizations, I'm not sure I had that in mind, at least at this point.
>  To me having the topology optimized should be an "all or nothing"
> proposition.  For now, I'll change the type to boolean (with a default
> value of false) to better reflect the intent of the configuration.
> 
> Ted, thanks again for the comments.
> 
> The intent of the new configuration, as I mentioned above, is whether to
> turn optimization on or off in aggregate.  The main reason for having the
> configuration is for backward compatibility.  As optimization may result in
> a slightly different topology from the original one, we need to allow users
> to turn it off until they are ready for migrating to the new topology.
> 
> I don't think having to select each optimization is a feasible solution.  I
> say this for few reasons:
> 
>    1. Maintenance will be an issue.  While the initial target is only a
>    small number of optimizations, but as the number grows, having to maintain
>    the number of options in the code will difficult at best.
>    2. Users probably don't want to reason about which combination of
>    optimizations to have.  Thinking about which optimizations to turn on or
>    off raises the complexity of deploying an application.
>    3. When using a relational database or other another framework that may
>    optimize queries, as far as I know, it's not a choice of which
>    optimizations to have, they are performed automatically.
> 
> Does this make sense?
> 
> 
> On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> There are 4 subtasks for KAFKA-6034
>>
>> If each optimization can be switched on/off, there should be 4 enums for
>> the switch.
>>
>> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com> wrote:
>>
>>> Hello Bill,
>>>
>>> Thanks for the KIP. My comments are the following:
>>>
>>> 1) We should state clearly what are the expected impact in
>> "Compatibility,
>>> Deprecation, and Migration Plan" when optimization is turned on.
>>>
>>> 2) I'm wondering why we define this config as a String typed, rather than
>>> boolean typed? Or are you expecting to extend it to allow partially
>>> enabling part of the optimizations in the future?
>>>
>>>
>>> Hi Ted,
>>>
>>> The cover JIRA for topology optimization can be found here:
>>> https://issues.apache.org/jira/browse/KAFKA-6034
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Which JIRA is for the Topology Optimization itself ?
>>>>
>>>> Thanks
>>>>
>>>> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
>> wrote:
>>>>
>>>>> All,
>>>>> I'd like to start a discussion about adding a configuration parameter
>>>>> allowing for the forthcoming topology optimization to be optional via
>>>>> configuration.
>>>>>
>>>>>
>>>>> The KIP can be found here:
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 295%3A+Add+Streams+
>>>>> Configuration+Allowing+for+Optional+Topology+Optimization
>>>>>
>>>>> Thanks,
>>>>> Bill
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>


Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Ted Yu <yu...@gmail.com>.
Bill:That makes sense.
Using boolean should suffice.
-------- Original message --------From: Bill Bejeck <bb...@gmail.com> Date: 5/8/18  2:48 PM  (GMT-08:00) To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization 
Thanks for the comments Guozhang and Ted.

Guozhang:

  1) I'll update the KIP in the "Compatibility, Deprecation and Migration
Plan" with the expected impact of turning on optimization. But at this
point, I have not identified a migration plan that doesn't involve having
to stop all instances and restart.

  2) Setting the type to String was just so we could have the default of
null, indicating run no optimizations. As for partially enabling
optimizations, I'm not sure I had that in mind, at least at this point.
 To me having the topology optimized should be an "all or nothing"
proposition.  For now, I'll change the type to boolean (with a default
value of false) to better reflect the intent of the configuration.

Ted, thanks again for the comments.

The intent of the new configuration, as I mentioned above, is whether to
turn optimization on or off in aggregate.  The main reason for having the
configuration is for backward compatibility.  As optimization may result in
a slightly different topology from the original one, we need to allow users
to turn it off until they are ready for migrating to the new topology.

I don't think having to select each optimization is a feasible solution.  I
say this for few reasons:

   1. Maintenance will be an issue.  While the initial target is only a
   small number of optimizations, but as the number grows, having to maintain
   the number of options in the code will difficult at best.
   2. Users probably don't want to reason about which combination of
   optimizations to have.  Thinking about which optimizations to turn on or
   off raises the complexity of deploying an application.
   3. When using a relational database or other another framework that may
   optimize queries, as far as I know, it's not a choice of which
   optimizations to have, they are performed automatically.

Does this make sense?


On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:

> There are 4 subtasks for KAFKA-6034
>
> If each optimization can be switched on/off, there should be 4 enums for
> the switch.
>
> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Bill,
> >
> > Thanks for the KIP. My comments are the following:
> >
> > 1) We should state clearly what are the expected impact in
> "Compatibility,
> > Deprecation, and Migration Plan" when optimization is turned on.
> >
> > 2) I'm wondering why we define this config as a String typed, rather than
> > boolean typed? Or are you expecting to extend it to allow partially
> > enabling part of the optimizations in the future?
> >
> >
> > Hi Ted,
> >
> > The cover JIRA for topology optimization can be found here:
> > https://issues.apache.org/jira/browse/KAFKA-6034
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Which JIRA is for the Topology Optimization itself ?
> > >
> > > Thanks
> > >
> > > On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
> wrote:
> > >
> > > > All,
> > > > I'd like to start a discussion about adding a configuration parameter
> > > > allowing for the forthcoming topology optimization to be optional via
> > > > configuration.
> > > >
> > > >
> > > > The KIP can be found here:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 295%3A+Add+Streams+
> > > > Configuration+Allowing+for+Optional+Topology+Optimization
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Bill Bejeck <bb...@gmail.com>.
Thanks for the comments Guozhang and Ted.

Guozhang:

  1) I'll update the KIP in the "Compatibility, Deprecation and Migration
Plan" with the expected impact of turning on optimization. But at this
point, I have not identified a migration plan that doesn't involve having
to stop all instances and restart.

  2) Setting the type to String was just so we could have the default of
null, indicating run no optimizations. As for partially enabling
optimizations, I'm not sure I had that in mind, at least at this point.
 To me having the topology optimized should be an "all or nothing"
proposition.  For now, I'll change the type to boolean (with a default
value of false) to better reflect the intent of the configuration.

Ted, thanks again for the comments.

The intent of the new configuration, as I mentioned above, is whether to
turn optimization on or off in aggregate.  The main reason for having the
configuration is for backward compatibility.  As optimization may result in
a slightly different topology from the original one, we need to allow users
to turn it off until they are ready for migrating to the new topology.

I don't think having to select each optimization is a feasible solution.  I
say this for few reasons:

   1. Maintenance will be an issue.  While the initial target is only a
   small number of optimizations, but as the number grows, having to maintain
   the number of options in the code will difficult at best.
   2. Users probably don't want to reason about which combination of
   optimizations to have.  Thinking about which optimizations to turn on or
   off raises the complexity of deploying an application.
   3. When using a relational database or other another framework that may
   optimize queries, as far as I know, it's not a choice of which
   optimizations to have, they are performed automatically.

Does this make sense?


On Mon, May 7, 2018 at 7:49 PM, Ted Yu <yu...@gmail.com> wrote:

> There are 4 subtasks for KAFKA-6034
>
> If each optimization can be switched on/off, there should be 4 enums for
> the switch.
>
> On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Bill,
> >
> > Thanks for the KIP. My comments are the following:
> >
> > 1) We should state clearly what are the expected impact in
> "Compatibility,
> > Deprecation, and Migration Plan" when optimization is turned on.
> >
> > 2) I'm wondering why we define this config as a String typed, rather than
> > boolean typed? Or are you expecting to extend it to allow partially
> > enabling part of the optimizations in the future?
> >
> >
> > Hi Ted,
> >
> > The cover JIRA for topology optimization can be found here:
> > https://issues.apache.org/jira/browse/KAFKA-6034
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Which JIRA is for the Topology Optimization itself ?
> > >
> > > Thanks
> > >
> > > On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com>
> wrote:
> > >
> > > > All,
> > > > I'd like to start a discussion about adding a configuration parameter
> > > > allowing for the forthcoming topology optimization to be optional via
> > > > configuration.
> > > >
> > > >
> > > > The KIP can be found here:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 295%3A+Add+Streams+
> > > > Configuration+Allowing+for+Optional+Topology+Optimization
> > > >
> > > > Thanks,
> > > > Bill
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Ted Yu <yu...@gmail.com>.
There are 4 subtasks for KAFKA-6034

If each optimization can be switched on/off, there should be 4 enums for
the switch.

On Mon, May 7, 2018 at 4:39 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Bill,
>
> Thanks for the KIP. My comments are the following:
>
> 1) We should state clearly what are the expected impact in "Compatibility,
> Deprecation, and Migration Plan" when optimization is turned on.
>
> 2) I'm wondering why we define this config as a String typed, rather than
> boolean typed? Or are you expecting to extend it to allow partially
> enabling part of the optimizations in the future?
>
>
> Hi Ted,
>
> The cover JIRA for topology optimization can be found here:
> https://issues.apache.org/jira/browse/KAFKA-6034
>
>
> Guozhang
>
>
> On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Which JIRA is for the Topology Optimization itself ?
> >
> > Thanks
> >
> > On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com> wrote:
> >
> > > All,
> > > I'd like to start a discussion about adding a configuration parameter
> > > allowing for the forthcoming topology optimization to be optional via
> > > configuration.
> > >
> > >
> > > The KIP can be found here:
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 295%3A+Add+Streams+
> > > Configuration+Allowing+for+Optional+Topology+Optimization
> > >
> > > Thanks,
> > > Bill
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Bill,

Thanks for the KIP. My comments are the following:

1) We should state clearly what are the expected impact in "Compatibility,
Deprecation, and Migration Plan" when optimization is turned on.

2) I'm wondering why we define this config as a String typed, rather than
boolean typed? Or are you expecting to extend it to allow partially
enabling part of the optimizations in the future?


Hi Ted,

The cover JIRA for topology optimization can be found here:
https://issues.apache.org/jira/browse/KAFKA-6034


Guozhang


On Mon, May 7, 2018 at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:

> Which JIRA is for the Topology Optimization itself ?
>
> Thanks
>
> On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com> wrote:
>
> > All,
> > I'd like to start a discussion about adding a configuration parameter
> > allowing for the forthcoming topology optimization to be optional via
> > configuration.
> >
> >
> > The KIP can be found here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 295%3A+Add+Streams+
> > Configuration+Allowing+for+Optional+Topology+Optimization
> >
> > Thanks,
> > Bill
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-295: Add Streams Configuration Allowing for Optional Topology Optimization

Posted by Ted Yu <yu...@gmail.com>.
Which JIRA is for the Topology Optimization itself ?

Thanks

On Mon, May 7, 2018 at 10:26 AM, Bill Bejeck <bb...@gmail.com> wrote:

> All,
> I'd like to start a discussion about adding a configuration parameter
> allowing for the forthcoming topology optimization to be optional via
> configuration.
>
>
> The KIP can be found here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-295%3A+Add+Streams+
> Configuration+Allowing+for+Optional+Topology+Optimization
>
> Thanks,
> Bill
>