You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Matthias J. Sax" <ma...@confluent.io> on 2017/03/08 22:43:58 UTC

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Hi,

sorry for not replying earlier and thanks for all your feedback. After
some more discussions I updated the KIP. The new proposal puts some
other design considerations into account, that I want to highlight
shortly. Those considerations, automatically resolve the concerns raised.

First some answers:

> The PAPI processors I use in my KStreams app are all functioning on KTable
> internals.  I wouldn't be able to convert them to process()/transform().
> 
> What's the harm in permitting both APIs to be used in the same application?

It's not about "harm" but about design. We want to switch from a
"inheritance" to a "composition" pattern.

About the interface idea: using a shared interface would not help to get
a composition pattern


Next I want to give the design considerations leading to the updated KIP:

1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
KafkaStreams client executes a `Topology` and this execution should be
independent of the way the topology is "put together", ie, low-level API
or DSL.

2) Thus, we don't want to have any changes to KafkaStreams class.

3) Thus, KStreamBuilder needs to have a method `build()` that returns a
`Topology` that can be passed into KafakStreams.

4) Because `KStreamBuilder` should build a `Topology` I suggest to
rename the new class to `StreamsTopologyBuilder` (the name
TopologyBuilder would actually be more natural, but would be easily
confused with old low-level API TopologyBuilder).

Thus, PAPI and DSL can be mixed-and-matched with full power, as
StreamsTopologyBuilder return the created Topology via #build().

I also removed `final` for both builder classes.



With regard to the larger scope of the overal API redesign, I also want
to point to a summary of API issues:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Discussions

Thus, this KIP is only one building block of a larger improvement
effort, and we hope to get as much as possible done for 0.11. If you
have any API improvement ideas, please share them so we can come up with
an holistic sound design (instead of uncoordinated local improvements
that might diverge)



Looking forward to your feedback on this KIP and the other API issues.



-Matthias




On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> - We also removed method #topologyBuilder() from KStreamBuilder because
>> we think #transform() should provide all functionality you need to
>> mix-an-match Processor API and DSL. If there is any further concern
>> about this, please let us know.
>>
> 
> Hi Matthias,
> 
> Yes, I'm sorry I didn't respond sooner, but I still have a lot of concerns
> about this.  You're correct to point out that transform() can be used for
> some of the output situations I pointed out; albeit it seems somewhat
> awkward to do so in a "transform" method; what do you do with the retval?
> 
> The PAPI processors I use in my KStreams app are all functioning on KTable
> internals.  I wouldn't be able to convert them to process()/transform().
> 
> What's the harm in permitting both APIs to be used in the same application?
> 
> Mathieu
> 


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Guozhang,

1) I updated the KIP to option (b).

3) Agreed. So we leave this part out, and tackle it within KIP-130


-Matthias


On 3/12/17 3:48 PM, Guozhang Wang wrote:
> Thanks Matthias.
> 
> 1) Given that TopologyDescription is for debugging purposes before
> `KafkaStreams.start()`. I think the simplest option b) may be sufficient.
> Just needs to emphasize its possible value semantics in Java docs.
> 
> 3) You can tell that I was actually thinking about this together with
> KIP-130. To me if we can expose the runtime information, which is dynamic,
> via metrics in KIP-130 then we could remove this function. The main reason
> is that, again, the task migration make this function's behavior a bit
> difficult to explain. For example:
> 
> streams.start();
> 
> sleep(/* some time */)
> 
> streams.toString();
> 
> ---------------------------
> 
> Even with the same configuration, depending on for how long did you wait
> after started, the function could return very different string results due
> to rebalances.
> 
> That being said, I was not trying to make the decision in this KIP, as I
> saw it more related to KIP-130. So we could probably still keep it as is in
> KIP-120, and consider removing it in KIP-130. That's why I was just "asking
> your thoughts on this", but not necessary wanting to make an action in this
> KIP.
> 
> 
> Guozhang
> 
> 
> 
> On Sat, Mar 11, 2017 at 11:10 AM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> Thanks for your feedback Guozhang.
>>
>>
>> 1) There are multiple ways to do this. Let me know what you think about
>> all options:
>>
>> (a) I updated the KIP to this:
>>
>>>     public final class Source implements Node {
>>>         public final String name;
>>>         // topicNames and topicPattern are mutually exclusive, i.e.,
>> only one will be not-null
>>>         public final List<String> topicNames; // null if #addSource(...,
>> Pattern) was used
>>>         public final Pattern topicPattern; // null if #addSource(...,
>> String...) was used
>>>     }
>>
>> (b) We could also go with a single variable (as originally proposed).
>> This would have the advantage (compared to (a)), that null checks are
>> not required accessing TopologyDescription#Source class.
>>
>>> String topics; // can be comma separated list of topic names or pattern
>> (as String)
>>
>> However, with an encoded list or an encoded pattern it's required to
>> parse the string again, what we want to avoid in the first place.
>>
>> (c) Use a single variable as in (b)
>>
>>> String topics; // always a pattern (as String)
>>
>> We translate a list of topic names into a pattern
>> "topic1|topic2|topic3". We loose the information if the source was added
>> via list or via pattern.
>>
>>
>>
>> 2) Your understanding is correct. Added a comment to the KIP.
>>
>>
>>
>> 3) I would keep KafkaStreams#toString() -- it's conceptually two
>> different things and runtime information is useful, too. But as its
>> return value is ambiguous to parse (and must be parsed in the first
>> place what is cumbersome), we could add KafkaStreams#describe() as
>>
>>> public synchronized KafkaStreamsDescription describe();
>>
>> KafkaStreamsDescription class would be similar to TopologyDescription to
>> allow programmatic access to runtime information. I guess we could even
>> reuse (parts of) TopologyDescription within KafkaStreamsDescription to
>> avoid code duplication.
>>
>> If you think this would be useful, I can extend the KIP accordingly.
>>
>>
>>
>> -Matthias
>>
>>
>>
>>
>> On 3/10/17 1:38 PM, Guozhang Wang wrote:
>>> One more question here:
>>>
>>> 3. with TopologyDescription, do we still want to keep the
>>> `KafkaStream.toString()` function? I think it may still have some
>> advantage
>>> such that it contains tasks information after `KafkaStream#start()` has
>>> been called, but much of it is duplicate with the TopologyDescription,
>> and
>>> it is only in the form of the string hence hard to programmatically
>>> leverage. So would like to hear your thoughts.
>>>
>>> Guozhang
>>>
>>> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wa...@gmail.com>
>> wrote:
>>>
>>>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor
>> comments:
>>>>
>>>> 1. With regard to this class:
>>>>
>>>>     public final class Source implements Node {
>>>>         public final String name;
>>>>         public final String topic; // can be topic name or pattern (as
>>>> String)
>>>>     }
>>>>
>>>> Note that the source node could contain more than a single topic, i.e. a
>>>> list of topics besides a pattern.
>>>>
>>>> 2. With regard to
>>>>
>>>>           public synchronized TopologyDescription describe();
>>>>
>>>> My understand is that whenever the topology is modified, one needs to
>> call
>>>> this function again to get a new description object, as the old one
>> won't
>>>> be updated automatically. Hence the usage pattern would be:
>>>>
>>>> TopologyDescription description = topology.describe();
>>>>
>>>> topology.addProcessor(...)
>>>>
>>>> description = topology.describe(); // have to call again
>>>>
>>>> -----------
>>>>
>>>> Is that right? If yes could you clarify this in the wiki?
>>>>
>>>>
>>>>
>>>> Guozhang
>>>>
>>>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io>
>> wrote:
>>>>
>>>>> Thanks for the update, Matthias.
>>>>>
>>>>> +1 to the points 1,2,3,4 you mentioned.
>>>>>
>>>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>>>> preference towards DslTopologyBuilder, but hey.)  The most important
>>>>> aspect
>>>>> is, IMHO, what you also pointed out:  to make it clear that the current
>>>>> KStreamBuilder actually builds a topology (though currently the latter
>> is
>>>>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <
>> matthias@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> sorry for not replying earlier and thanks for all your feedback. After
>>>>>> some more discussions I updated the KIP. The new proposal puts some
>>>>>> other design considerations into account, that I want to highlight
>>>>>> shortly. Those considerations, automatically resolve the concerns
>>>>> raised.
>>>>>>
>>>>>> First some answers:
>>>>>>
>>>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>>>> KTable
>>>>>>> internals.  I wouldn't be able to convert them to
>>>>> process()/transform().
>>>>>>>
>>>>>>> What's the harm in permitting both APIs to be used in the same
>>>>>> application?
>>>>>>
>>>>>> It's not about "harm" but about design. We want to switch from a
>>>>>> "inheritance" to a "composition" pattern.
>>>>>>
>>>>>> About the interface idea: using a shared interface would not help to
>> get
>>>>>> a composition pattern
>>>>>>
>>>>>>
>>>>>> Next I want to give the design considerations leading to the updated
>>>>> KIP:
>>>>>>
>>>>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is
>> unnatural.
>>>>>> KafkaStreams client executes a `Topology` and this execution should be
>>>>>> independent of the way the topology is "put together", ie, low-level
>> API
>>>>>> or DSL.
>>>>>>
>>>>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
>>>>>>
>>>>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns
>> a
>>>>>> `Topology` that can be passed into KafakStreams.
>>>>>>
>>>>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>>>>>> rename the new class to `StreamsTopologyBuilder` (the name
>>>>>> TopologyBuilder would actually be more natural, but would be easily
>>>>>> confused with old low-level API TopologyBuilder).
>>>>>>
>>>>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
>>>>>> StreamsTopologyBuilder return the created Topology via #build().
>>>>>>
>>>>>> I also removed `final` for both builder classes.
>>>>>>
>>>>>>
>>>>>>
>>>>>> With regard to the larger scope of the overal API redesign, I also
>> want
>>>>>> to point to a summary of API issues:
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>> Kafka+Streams+Discussions
>>>>>>
>>>>>> Thus, this KIP is only one building block of a larger improvement
>>>>>> effort, and we hope to get as much as possible done for 0.11. If you
>>>>>> have any API improvement ideas, please share them so we can come up
>> with
>>>>>> an holistic sound design (instead of uncoordinated local improvements
>>>>>> that might diverge)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Looking forward to your feedback on this KIP and the other API issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>>>>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>>>>> matthias@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
>>>>> because
>>>>>>>> we think #transform() should provide all functionality you need to
>>>>>>>> mix-an-match Processor API and DSL. If there is any further concern
>>>>>>>> about this, please let us know.
>>>>>>>>
>>>>>>>
>>>>>>> Hi Matthias,
>>>>>>>
>>>>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>>>>>> concerns
>>>>>>> about this.  You're correct to point out that transform() can be used
>>>>> for
>>>>>>> some of the output situations I pointed out; albeit it seems somewhat
>>>>>>> awkward to do so in a "transform" method; what do you do with the
>>>>> retval?
>>>>>>>
>>>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>>>> KTable
>>>>>>> internals.  I wouldn't be able to convert them to
>>>>> process()/transform().
>>>>>>>
>>>>>>> What's the harm in permitting both APIs to be used in the same
>>>>>> application?
>>>>>>>
>>>>>>> Mathieu
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>>
>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Matthias.

1) Given that TopologyDescription is for debugging purposes before
`KafkaStreams.start()`. I think the simplest option b) may be sufficient.
Just needs to emphasize its possible value semantics in Java docs.

3) You can tell that I was actually thinking about this together with
KIP-130. To me if we can expose the runtime information, which is dynamic,
via metrics in KIP-130 then we could remove this function. The main reason
is that, again, the task migration make this function's behavior a bit
difficult to explain. For example:

streams.start();

sleep(/* some time */)

streams.toString();

---------------------------

Even with the same configuration, depending on for how long did you wait
after started, the function could return very different string results due
to rebalances.

That being said, I was not trying to make the decision in this KIP, as I
saw it more related to KIP-130. So we could probably still keep it as is in
KIP-120, and consider removing it in KIP-130. That's why I was just "asking
your thoughts on this", but not necessary wanting to make an action in this
KIP.


Guozhang



On Sat, Mar 11, 2017 at 11:10 AM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for your feedback Guozhang.
>
>
> 1) There are multiple ways to do this. Let me know what you think about
> all options:
>
> (a) I updated the KIP to this:
>
> >     public final class Source implements Node {
> >         public final String name;
> >         // topicNames and topicPattern are mutually exclusive, i.e.,
> only one will be not-null
> >         public final List<String> topicNames; // null if #addSource(...,
> Pattern) was used
> >         public final Pattern topicPattern; // null if #addSource(...,
> String...) was used
> >     }
>
> (b) We could also go with a single variable (as originally proposed).
> This would have the advantage (compared to (a)), that null checks are
> not required accessing TopologyDescription#Source class.
>
> > String topics; // can be comma separated list of topic names or pattern
> (as String)
>
> However, with an encoded list or an encoded pattern it's required to
> parse the string again, what we want to avoid in the first place.
>
> (c) Use a single variable as in (b)
>
> > String topics; // always a pattern (as String)
>
> We translate a list of topic names into a pattern
> "topic1|topic2|topic3". We loose the information if the source was added
> via list or via pattern.
>
>
>
> 2) Your understanding is correct. Added a comment to the KIP.
>
>
>
> 3) I would keep KafkaStreams#toString() -- it's conceptually two
> different things and runtime information is useful, too. But as its
> return value is ambiguous to parse (and must be parsed in the first
> place what is cumbersome), we could add KafkaStreams#describe() as
>
> > public synchronized KafkaStreamsDescription describe();
>
> KafkaStreamsDescription class would be similar to TopologyDescription to
> allow programmatic access to runtime information. I guess we could even
> reuse (parts of) TopologyDescription within KafkaStreamsDescription to
> avoid code duplication.
>
> If you think this would be useful, I can extend the KIP accordingly.
>
>
>
> -Matthias
>
>
>
>
> On 3/10/17 1:38 PM, Guozhang Wang wrote:
> > One more question here:
> >
> > 3. with TopologyDescription, do we still want to keep the
> > `KafkaStream.toString()` function? I think it may still have some
> advantage
> > such that it contains tasks information after `KafkaStream#start()` has
> > been called, but much of it is duplicate with the TopologyDescription,
> and
> > it is only in the form of the string hence hard to programmatically
> > leverage. So would like to hear your thoughts.
> >
> > Guozhang
> >
> > On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> Thanks Matthias, the updated KIP lgtm overall. A couple of minor
> comments:
> >>
> >> 1. With regard to this class:
> >>
> >>     public final class Source implements Node {
> >>         public final String name;
> >>         public final String topic; // can be topic name or pattern (as
> >> String)
> >>     }
> >>
> >> Note that the source node could contain more than a single topic, i.e. a
> >> list of topics besides a pattern.
> >>
> >> 2. With regard to
> >>
> >>           public synchronized TopologyDescription describe();
> >>
> >> My understand is that whenever the topology is modified, one needs to
> call
> >> this function again to get a new description object, as the old one
> won't
> >> be updated automatically. Hence the usage pattern would be:
> >>
> >> TopologyDescription description = topology.describe();
> >>
> >> topology.addProcessor(...)
> >>
> >> description = topology.describe(); // have to call again
> >>
> >> -----------
> >>
> >> Is that right? If yes could you clarify this in the wiki?
> >>
> >>
> >>
> >> Guozhang
> >>
> >> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io>
> wrote:
> >>
> >>> Thanks for the update, Matthias.
> >>>
> >>> +1 to the points 1,2,3,4 you mentioned.
> >>>
> >>> Naming is always a tricky subject, but renaming KStreamBuilder
> >>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
> >>> preference towards DslTopologyBuilder, but hey.)  The most important
> >>> aspect
> >>> is, IMHO, what you also pointed out:  to make it clear that the current
> >>> KStreamBuilder actually builds a topology (though currently the latter
> is
> >>> actually called `TopologyBuilder` currently), and not a `KStream`.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <
> matthias@confluent.io>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> sorry for not replying earlier and thanks for all your feedback. After
> >>>> some more discussions I updated the KIP. The new proposal puts some
> >>>> other design considerations into account, that I want to highlight
> >>>> shortly. Those considerations, automatically resolve the concerns
> >>> raised.
> >>>>
> >>>> First some answers:
> >>>>
> >>>>> The PAPI processors I use in my KStreams app are all functioning on
> >>>> KTable
> >>>>> internals.  I wouldn't be able to convert them to
> >>> process()/transform().
> >>>>>
> >>>>> What's the harm in permitting both APIs to be used in the same
> >>>> application?
> >>>>
> >>>> It's not about "harm" but about design. We want to switch from a
> >>>> "inheritance" to a "composition" pattern.
> >>>>
> >>>> About the interface idea: using a shared interface would not help to
> get
> >>>> a composition pattern
> >>>>
> >>>>
> >>>> Next I want to give the design considerations leading to the updated
> >>> KIP:
> >>>>
> >>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is
> unnatural.
> >>>> KafkaStreams client executes a `Topology` and this execution should be
> >>>> independent of the way the topology is "put together", ie, low-level
> API
> >>>> or DSL.
> >>>>
> >>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
> >>>>
> >>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns
> a
> >>>> `Topology` that can be passed into KafakStreams.
> >>>>
> >>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> >>>> rename the new class to `StreamsTopologyBuilder` (the name
> >>>> TopologyBuilder would actually be more natural, but would be easily
> >>>> confused with old low-level API TopologyBuilder).
> >>>>
> >>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
> >>>> StreamsTopologyBuilder return the created Topology via #build().
> >>>>
> >>>> I also removed `final` for both builder classes.
> >>>>
> >>>>
> >>>>
> >>>> With regard to the larger scope of the overal API redesign, I also
> want
> >>>> to point to a summary of API issues:
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/
> >>>> Kafka+Streams+Discussions
> >>>>
> >>>> Thus, this KIP is only one building block of a larger improvement
> >>>> effort, and we hope to get as much as possible done for 0.11. If you
> >>>> have any API improvement ideas, please share them so we can come up
> with
> >>>> an holistic sound design (instead of uncoordinated local improvements
> >>>> that might diverge)
> >>>>
> >>>>
> >>>>
> >>>> Looking forward to your feedback on this KIP and the other API issues.
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> >>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
> >>> matthias@confluent.io>
> >>>>> wrote:
> >>>>>
> >>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
> >>> because
> >>>>>> we think #transform() should provide all functionality you need to
> >>>>>> mix-an-match Processor API and DSL. If there is any further concern
> >>>>>> about this, please let us know.
> >>>>>>
> >>>>>
> >>>>> Hi Matthias,
> >>>>>
> >>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> >>>> concerns
> >>>>> about this.  You're correct to point out that transform() can be used
> >>> for
> >>>>> some of the output situations I pointed out; albeit it seems somewhat
> >>>>> awkward to do so in a "transform" method; what do you do with the
> >>> retval?
> >>>>>
> >>>>> The PAPI processors I use in my KStreams app are all functioning on
> >>>> KTable
> >>>>> internals.  I wouldn't be able to convert them to
> >>> process()/transform().
> >>>>>
> >>>>> What's the harm in permitting both APIs to be used in the same
> >>>> application?
> >>>>>
> >>>>> Mathieu
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for your feedback Guozhang.


1) There are multiple ways to do this. Let me know what you think about
all options:

(a) I updated the KIP to this:

>     public final class Source implements Node {
>         public final String name;
>         // topicNames and topicPattern are mutually exclusive, i.e., only one will be not-null
>         public final List<String> topicNames; // null if #addSource(..., Pattern) was used
>         public final Pattern topicPattern; // null if #addSource(..., String...) was used
>     }

(b) We could also go with a single variable (as originally proposed).
This would have the advantage (compared to (a)), that null checks are
not required accessing TopologyDescription#Source class.

> String topics; // can be comma separated list of topic names or pattern (as String)

However, with an encoded list or an encoded pattern it's required to
parse the string again, what we want to avoid in the first place.

(c) Use a single variable as in (b)

> String topics; // always a pattern (as String)

We translate a list of topic names into a pattern
"topic1|topic2|topic3". We loose the information if the source was added
via list or via pattern.



2) Your understanding is correct. Added a comment to the KIP.



3) I would keep KafkaStreams#toString() -- it's conceptually two
different things and runtime information is useful, too. But as its
return value is ambiguous to parse (and must be parsed in the first
place what is cumbersome), we could add KafkaStreams#describe() as

> public synchronized KafkaStreamsDescription describe();

KafkaStreamsDescription class would be similar to TopologyDescription to
allow programmatic access to runtime information. I guess we could even
reuse (parts of) TopologyDescription within KafkaStreamsDescription to
avoid code duplication.

If you think this would be useful, I can extend the KIP accordingly.



-Matthias




On 3/10/17 1:38 PM, Guozhang Wang wrote:
> One more question here:
> 
> 3. with TopologyDescription, do we still want to keep the
> `KafkaStream.toString()` function? I think it may still have some advantage
> such that it contains tasks information after `KafkaStream#start()` has
> been called, but much of it is duplicate with the TopologyDescription, and
> it is only in the form of the string hence hard to programmatically
> leverage. So would like to hear your thoughts.
> 
> Guozhang
> 
> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wa...@gmail.com> wrote:
> 
>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>>
>> 1. With regard to this class:
>>
>>     public final class Source implements Node {
>>         public final String name;
>>         public final String topic; // can be topic name or pattern (as
>> String)
>>     }
>>
>> Note that the source node could contain more than a single topic, i.e. a
>> list of topics besides a pattern.
>>
>> 2. With regard to
>>
>>           public synchronized TopologyDescription describe();
>>
>> My understand is that whenever the topology is modified, one needs to call
>> this function again to get a new description object, as the old one won't
>> be updated automatically. Hence the usage pattern would be:
>>
>> TopologyDescription description = topology.describe();
>>
>> topology.addProcessor(...)
>>
>> description = topology.describe(); // have to call again
>>
>> -----------
>>
>> Is that right? If yes could you clarify this in the wiki?
>>
>>
>>
>> Guozhang
>>
>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io> wrote:
>>
>>> Thanks for the update, Matthias.
>>>
>>> +1 to the points 1,2,3,4 you mentioned.
>>>
>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>> preference towards DslTopologyBuilder, but hey.)  The most important
>>> aspect
>>> is, IMHO, what you also pointed out:  to make it clear that the current
>>> KStreamBuilder actually builds a topology (though currently the latter is
>>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> sorry for not replying earlier and thanks for all your feedback. After
>>>> some more discussions I updated the KIP. The new proposal puts some
>>>> other design considerations into account, that I want to highlight
>>>> shortly. Those considerations, automatically resolve the concerns
>>> raised.
>>>>
>>>> First some answers:
>>>>
>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>> KTable
>>>>> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>>>>>
>>>>> What's the harm in permitting both APIs to be used in the same
>>>> application?
>>>>
>>>> It's not about "harm" but about design. We want to switch from a
>>>> "inheritance" to a "composition" pattern.
>>>>
>>>> About the interface idea: using a shared interface would not help to get
>>>> a composition pattern
>>>>
>>>>
>>>> Next I want to give the design considerations leading to the updated
>>> KIP:
>>>>
>>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
>>>> KafkaStreams client executes a `Topology` and this execution should be
>>>> independent of the way the topology is "put together", ie, low-level API
>>>> or DSL.
>>>>
>>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
>>>>
>>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
>>>> `Topology` that can be passed into KafakStreams.
>>>>
>>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>>>> rename the new class to `StreamsTopologyBuilder` (the name
>>>> TopologyBuilder would actually be more natural, but would be easily
>>>> confused with old low-level API TopologyBuilder).
>>>>
>>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
>>>> StreamsTopologyBuilder return the created Topology via #build().
>>>>
>>>> I also removed `final` for both builder classes.
>>>>
>>>>
>>>>
>>>> With regard to the larger scope of the overal API redesign, I also want
>>>> to point to a summary of API issues:
>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>> Kafka+Streams+Discussions
>>>>
>>>> Thus, this KIP is only one building block of a larger improvement
>>>> effort, and we hope to get as much as possible done for 0.11. If you
>>>> have any API improvement ideas, please share them so we can come up with
>>>> an holistic sound design (instead of uncoordinated local improvements
>>>> that might diverge)
>>>>
>>>>
>>>>
>>>> Looking forward to your feedback on this KIP and the other API issues.
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>>
>>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>>> matthias@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
>>> because
>>>>>> we think #transform() should provide all functionality you need to
>>>>>> mix-an-match Processor API and DSL. If there is any further concern
>>>>>> about this, please let us know.
>>>>>>
>>>>>
>>>>> Hi Matthias,
>>>>>
>>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>>>> concerns
>>>>> about this.  You're correct to point out that transform() can be used
>>> for
>>>>> some of the output situations I pointed out; albeit it seems somewhat
>>>>> awkward to do so in a "transform" method; what do you do with the
>>> retval?
>>>>>
>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>> KTable
>>>>> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>>>>>
>>>>> What's the harm in permitting both APIs to be used in the same
>>>> application?
>>>>>
>>>>> Mathieu
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
> 
> 
> 


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for your feedback Guozhang.


1) There are multiple ways to do this. Let me know what you think about
all options:

(a) I updated the KIP to this:

>     public final class Source implements Node {
>         public final String name;
>         // topicNames and topicPattern are mutually exclusive, i.e., only one will be not-null
>         public final List<String> topicNames; // null if #addSource(..., Pattern) was used
>         public final Pattern topicPattern; // null if #addSource(..., String...) was used
>     }

(b) We could also go with a single variable (as originally proposed).
This would have the advantage (compared to (a)), that null checks are
not required accessing TopologyDescription#Source class.

> String topics; // can be comma separated list of topic names or pattern (as String)

However, with an encoded list or an encoded pattern it's required to
parse the string again, what we want to avoid in the first place.

(c) Use a single variable as in (b)

> String topics; // always a pattern (as String)

We translate a list of topic names into a pattern
"topic1|topic2|topic3". We loose the information if the source was added
via list or via pattern.



2) Your understanding is correct. Added a comment to the KIP.



3) I would keep KafkaStreams#toString() -- it's conceptually two
different things and runtime information is useful, too. But as its
return value is ambiguous to parse (and must be parsed in the first
place what is cumbersome), we could add KafkaStreams#describe() as

> public synchronized KafkaStreamsDescription describe();

KafkaStreamsDescription class would be similar to TopologyDescription to
allow programmatic access to runtime information. I guess we could even
reuse (parts of) TopologyDescription within KafkaStreamsDescription to
avoid code duplication.

If you think this would be useful, I can extend the KIP accordingly.



-Matthias




On 3/10/17 1:38 PM, Guozhang Wang wrote:
> One more question here:
> 
> 3. with TopologyDescription, do we still want to keep the
> `KafkaStream.toString()` function? I think it may still have some advantage
> such that it contains tasks information after `KafkaStream#start()` has
> been called, but much of it is duplicate with the TopologyDescription, and
> it is only in the form of the string hence hard to programmatically
> leverage. So would like to hear your thoughts.
> 
> Guozhang
> 
> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wa...@gmail.com> wrote:
> 
>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>>
>> 1. With regard to this class:
>>
>>     public final class Source implements Node {
>>         public final String name;
>>         public final String topic; // can be topic name or pattern (as
>> String)
>>     }
>>
>> Note that the source node could contain more than a single topic, i.e. a
>> list of topics besides a pattern.
>>
>> 2. With regard to
>>
>>           public synchronized TopologyDescription describe();
>>
>> My understand is that whenever the topology is modified, one needs to call
>> this function again to get a new description object, as the old one won't
>> be updated automatically. Hence the usage pattern would be:
>>
>> TopologyDescription description = topology.describe();
>>
>> topology.addProcessor(...)
>>
>> description = topology.describe(); // have to call again
>>
>> -----------
>>
>> Is that right? If yes could you clarify this in the wiki?
>>
>>
>>
>> Guozhang
>>
>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io> wrote:
>>
>>> Thanks for the update, Matthias.
>>>
>>> +1 to the points 1,2,3,4 you mentioned.
>>>
>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>> preference towards DslTopologyBuilder, but hey.)  The most important
>>> aspect
>>> is, IMHO, what you also pointed out:  to make it clear that the current
>>> KStreamBuilder actually builds a topology (though currently the latter is
>>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> sorry for not replying earlier and thanks for all your feedback. After
>>>> some more discussions I updated the KIP. The new proposal puts some
>>>> other design considerations into account, that I want to highlight
>>>> shortly. Those considerations, automatically resolve the concerns
>>> raised.
>>>>
>>>> First some answers:
>>>>
>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>> KTable
>>>>> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>>>>>
>>>>> What's the harm in permitting both APIs to be used in the same
>>>> application?
>>>>
>>>> It's not about "harm" but about design. We want to switch from a
>>>> "inheritance" to a "composition" pattern.
>>>>
>>>> About the interface idea: using a shared interface would not help to get
>>>> a composition pattern
>>>>
>>>>
>>>> Next I want to give the design considerations leading to the updated
>>> KIP:
>>>>
>>>> 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
>>>> KafkaStreams client executes a `Topology` and this execution should be
>>>> independent of the way the topology is "put together", ie, low-level API
>>>> or DSL.
>>>>
>>>> 2) Thus, we don't want to have any changes to KafkaStreams class.
>>>>
>>>> 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
>>>> `Topology` that can be passed into KafakStreams.
>>>>
>>>> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>>>> rename the new class to `StreamsTopologyBuilder` (the name
>>>> TopologyBuilder would actually be more natural, but would be easily
>>>> confused with old low-level API TopologyBuilder).
>>>>
>>>> Thus, PAPI and DSL can be mixed-and-matched with full power, as
>>>> StreamsTopologyBuilder return the created Topology via #build().
>>>>
>>>> I also removed `final` for both builder classes.
>>>>
>>>>
>>>>
>>>> With regard to the larger scope of the overal API redesign, I also want
>>>> to point to a summary of API issues:
>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>> Kafka+Streams+Discussions
>>>>
>>>> Thus, this KIP is only one building block of a larger improvement
>>>> effort, and we hope to get as much as possible done for 0.11. If you
>>>> have any API improvement ideas, please share them so we can come up with
>>>> an holistic sound design (instead of uncoordinated local improvements
>>>> that might diverge)
>>>>
>>>>
>>>>
>>>> Looking forward to your feedback on this KIP and the other API issues.
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>>
>>>> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>>>>> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>>> matthias@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> - We also removed method #topologyBuilder() from KStreamBuilder
>>> because
>>>>>> we think #transform() should provide all functionality you need to
>>>>>> mix-an-match Processor API and DSL. If there is any further concern
>>>>>> about this, please let us know.
>>>>>>
>>>>>
>>>>> Hi Matthias,
>>>>>
>>>>> Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>>>> concerns
>>>>> about this.  You're correct to point out that transform() can be used
>>> for
>>>>> some of the output situations I pointed out; albeit it seems somewhat
>>>>> awkward to do so in a "transform" method; what do you do with the
>>> retval?
>>>>>
>>>>> The PAPI processors I use in my KStreams app are all functioning on
>>>> KTable
>>>>> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>>>>>
>>>>> What's the harm in permitting both APIs to be used in the same
>>>> application?
>>>>>
>>>>> Mathieu
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
> 
> 
> 


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Guozhang Wang <wa...@gmail.com>.
One more question here:

3. with TopologyDescription, do we still want to keep the
`KafkaStream.toString()` function? I think it may still have some advantage
such that it contains tasks information after `KafkaStream#start()` has
been called, but much of it is duplicate with the TopologyDescription, and
it is only in the form of the string hence hard to programmatically
leverage. So would like to hear your thoughts.

Guozhang

On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>
> 1. With regard to this class:
>
>     public final class Source implements Node {
>         public final String name;
>         public final String topic; // can be topic name or pattern (as
> String)
>     }
>
> Note that the source node could contain more than a single topic, i.e. a
> list of topics besides a pattern.
>
> 2. With regard to
>
>           public synchronized TopologyDescription describe();
>
> My understand is that whenever the topology is modified, one needs to call
> this function again to get a new description object, as the old one won't
> be updated automatically. Hence the usage pattern would be:
>
> TopologyDescription description = topology.describe();
>
> topology.addProcessor(...)
>
> description = topology.describe(); // have to call again
>
> -----------
>
> Is that right? If yes could you clarify this in the wiki?
>
>
>
> Guozhang
>
> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io> wrote:
>
>> Thanks for the update, Matthias.
>>
>> +1 to the points 1,2,3,4 you mentioned.
>>
>> Naming is always a tricky subject, but renaming KStreamBuilder
>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>> preference towards DslTopologyBuilder, but hey.)  The most important
>> aspect
>> is, IMHO, what you also pointed out:  to make it clear that the current
>> KStreamBuilder actually builds a topology (though currently the latter is
>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>
>>
>>
>>
>>
>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>> > Hi,
>> >
>> > sorry for not replying earlier and thanks for all your feedback. After
>> > some more discussions I updated the KIP. The new proposal puts some
>> > other design considerations into account, that I want to highlight
>> > shortly. Those considerations, automatically resolve the concerns
>> raised.
>> >
>> > First some answers:
>> >
>> > > The PAPI processors I use in my KStreams app are all functioning on
>> > KTable
>> > > internals.  I wouldn't be able to convert them to
>> process()/transform().
>> > >
>> > > What's the harm in permitting both APIs to be used in the same
>> > application?
>> >
>> > It's not about "harm" but about design. We want to switch from a
>> > "inheritance" to a "composition" pattern.
>> >
>> > About the interface idea: using a shared interface would not help to get
>> > a composition pattern
>> >
>> >
>> > Next I want to give the design considerations leading to the updated
>> KIP:
>> >
>> > 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
>> > KafkaStreams client executes a `Topology` and this execution should be
>> > independent of the way the topology is "put together", ie, low-level API
>> > or DSL.
>> >
>> > 2) Thus, we don't want to have any changes to KafkaStreams class.
>> >
>> > 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
>> > `Topology` that can be passed into KafakStreams.
>> >
>> > 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>> > rename the new class to `StreamsTopologyBuilder` (the name
>> > TopologyBuilder would actually be more natural, but would be easily
>> > confused with old low-level API TopologyBuilder).
>> >
>> > Thus, PAPI and DSL can be mixed-and-matched with full power, as
>> > StreamsTopologyBuilder return the created Topology via #build().
>> >
>> > I also removed `final` for both builder classes.
>> >
>> >
>> >
>> > With regard to the larger scope of the overal API redesign, I also want
>> > to point to a summary of API issues:
>> > https://cwiki.apache.org/confluence/display/KAFKA/
>> > Kafka+Streams+Discussions
>> >
>> > Thus, this KIP is only one building block of a larger improvement
>> > effort, and we hope to get as much as possible done for 0.11. If you
>> > have any API improvement ideas, please share them so we can come up with
>> > an holistic sound design (instead of uncoordinated local improvements
>> > that might diverge)
>> >
>> >
>> >
>> > Looking forward to your feedback on this KIP and the other API issues.
>> >
>> >
>> >
>> > -Matthias
>> >
>> >
>> >
>> >
>> > On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>> > > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>> matthias@confluent.io>
>> > > wrote:
>> > >
>> > >> - We also removed method #topologyBuilder() from KStreamBuilder
>> because
>> > >> we think #transform() should provide all functionality you need to
>> > >> mix-an-match Processor API and DSL. If there is any further concern
>> > >> about this, please let us know.
>> > >>
>> > >
>> > > Hi Matthias,
>> > >
>> > > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>> > concerns
>> > > about this.  You're correct to point out that transform() can be used
>> for
>> > > some of the output situations I pointed out; albeit it seems somewhat
>> > > awkward to do so in a "transform" method; what do you do with the
>> retval?
>> > >
>> > > The PAPI processors I use in my KStreams app are all functioning on
>> > KTable
>> > > internals.  I wouldn't be able to convert them to
>> process()/transform().
>> > >
>> > > What's the harm in permitting both APIs to be used in the same
>> > application?
>> > >
>> > > Mathieu
>> > >
>> >
>> >
>>
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang

Fwd: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Guozhang Wang <wa...@gmail.com>.
+dev as well.

---------- Forwarded message ----------
From: Guozhang Wang <wa...@gmail.com>
Date: Thu, Mar 9, 2017 at 11:20 PM
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
To: "users@kafka.apache.org" <us...@kafka.apache.org>


Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:

1. With regard to this class:

    public final class Source implements Node {
        public final String name;
        public final String topic; // can be topic name or pattern (as
String)
    }

Note that the source node could contain more than a single topic, i.e. a
list of topics besides a pattern.

2. With regard to

          public synchronized TopologyDescription describe();

My understand is that whenever the topology is modified, one needs to call
this function again to get a new description object, as the old one won't
be updated automatically. Hence the usage pattern would be:

TopologyDescription description = topology.describe();

topology.addProcessor(...)

description = topology.describe(); // have to call again

-----------

Is that right? If yes could you clarify this in the wiki?



Guozhang

On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io> wrote:

> Thanks for the update, Matthias.
>
> +1 to the points 1,2,3,4 you mentioned.
>
> Naming is always a tricky subject, but renaming KStreamBuilder
> to StreamsTopologyBuilder looks ok to me (I would have had a slight
> preference towards DslTopologyBuilder, but hey.)  The most important aspect
> is, IMHO, what you also pointed out:  to make it clear that the current
> KStreamBuilder actually builds a topology (though currently the latter is
> actually called `TopologyBuilder` currently), and not a `KStream`.
>
>
>
>
>
> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Hi,
> >
> > sorry for not replying earlier and thanks for all your feedback. After
> > some more discussions I updated the KIP. The new proposal puts some
> > other design considerations into account, that I want to highlight
> > shortly. Those considerations, automatically resolve the concerns raised.
> >
> > First some answers:
> >
> > > The PAPI processors I use in my KStreams app are all functioning on
> > KTable
> > > internals.  I wouldn't be able to convert them to
> process()/transform().
> > >
> > > What's the harm in permitting both APIs to be used in the same
> > application?
> >
> > It's not about "harm" but about design. We want to switch from a
> > "inheritance" to a "composition" pattern.
> >
> > About the interface idea: using a shared interface would not help to get
> > a composition pattern
> >
> >
> > Next I want to give the design considerations leading to the updated KIP:
> >
> > 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
> > KafkaStreams client executes a `Topology` and this execution should be
> > independent of the way the topology is "put together", ie, low-level API
> > or DSL.
> >
> > 2) Thus, we don't want to have any changes to KafkaStreams class.
> >
> > 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
> > `Topology` that can be passed into KafakStreams.
> >
> > 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> > rename the new class to `StreamsTopologyBuilder` (the name
> > TopologyBuilder would actually be more natural, but would be easily
> > confused with old low-level API TopologyBuilder).
> >
> > Thus, PAPI and DSL can be mixed-and-matched with full power, as
> > StreamsTopologyBuilder return the created Topology via #build().
> >
> > I also removed `final` for both builder classes.
> >
> >
> >
> > With regard to the larger scope of the overal API redesign, I also want
> > to point to a summary of API issues:
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams+Discussions
> >
> > Thus, this KIP is only one building block of a larger improvement
> > effort, and we hope to get as much as possible done for 0.11. If you
> > have any API improvement ideas, please share them so we can come up with
> > an holistic sound design (instead of uncoordinated local improvements
> > that might diverge)
> >
> >
> >
> > Looking forward to your feedback on this KIP and the other API issues.
> >
> >
> >
> > -Matthias
> >
> >
> >
> >
> > On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> > > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
> matthias@confluent.io>
> > > wrote:
> > >
> > >> - We also removed method #topologyBuilder() from KStreamBuilder
> because
> > >> we think #transform() should provide all functionality you need to
> > >> mix-an-match Processor API and DSL. If there is any further concern
> > >> about this, please let us know.
> > >>
> > >
> > > Hi Matthias,
> > >
> > > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> > concerns
> > > about this.  You're correct to point out that transform() can be used
> for
> > > some of the output situations I pointed out; albeit it seems somewhat
> > > awkward to do so in a "transform" method; what do you do with the
> retval?
> > >
> > > The PAPI processors I use in my KStreams app are all functioning on
> > KTable
> > > internals.  I wouldn't be able to convert them to
> process()/transform().
> > >
> > > What's the harm in permitting both APIs to be used in the same
> > application?
> > >
> > > Mathieu
> > >
> >
> >
>



-- 
-- Guozhang



-- 
-- Guozhang

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Guozhang Wang <wa...@gmail.com>.
One more question here:

3. with TopologyDescription, do we still want to keep the
`KafkaStream.toString()` function? I think it may still have some advantage
such that it contains tasks information after `KafkaStream#start()` has
been called, but much of it is duplicate with the TopologyDescription, and
it is only in the form of the string hence hard to programmatically
leverage. So would like to hear your thoughts.

Guozhang

On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>
> 1. With regard to this class:
>
>     public final class Source implements Node {
>         public final String name;
>         public final String topic; // can be topic name or pattern (as
> String)
>     }
>
> Note that the source node could contain more than a single topic, i.e. a
> list of topics besides a pattern.
>
> 2. With regard to
>
>           public synchronized TopologyDescription describe();
>
> My understand is that whenever the topology is modified, one needs to call
> this function again to get a new description object, as the old one won't
> be updated automatically. Hence the usage pattern would be:
>
> TopologyDescription description = topology.describe();
>
> topology.addProcessor(...)
>
> description = topology.describe(); // have to call again
>
> -----------
>
> Is that right? If yes could you clarify this in the wiki?
>
>
>
> Guozhang
>
> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io> wrote:
>
>> Thanks for the update, Matthias.
>>
>> +1 to the points 1,2,3,4 you mentioned.
>>
>> Naming is always a tricky subject, but renaming KStreamBuilder
>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>> preference towards DslTopologyBuilder, but hey.)  The most important
>> aspect
>> is, IMHO, what you also pointed out:  to make it clear that the current
>> KStreamBuilder actually builds a topology (though currently the latter is
>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>
>>
>>
>>
>>
>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>> > Hi,
>> >
>> > sorry for not replying earlier and thanks for all your feedback. After
>> > some more discussions I updated the KIP. The new proposal puts some
>> > other design considerations into account, that I want to highlight
>> > shortly. Those considerations, automatically resolve the concerns
>> raised.
>> >
>> > First some answers:
>> >
>> > > The PAPI processors I use in my KStreams app are all functioning on
>> > KTable
>> > > internals.  I wouldn't be able to convert them to
>> process()/transform().
>> > >
>> > > What's the harm in permitting both APIs to be used in the same
>> > application?
>> >
>> > It's not about "harm" but about design. We want to switch from a
>> > "inheritance" to a "composition" pattern.
>> >
>> > About the interface idea: using a shared interface would not help to get
>> > a composition pattern
>> >
>> >
>> > Next I want to give the design considerations leading to the updated
>> KIP:
>> >
>> > 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
>> > KafkaStreams client executes a `Topology` and this execution should be
>> > independent of the way the topology is "put together", ie, low-level API
>> > or DSL.
>> >
>> > 2) Thus, we don't want to have any changes to KafkaStreams class.
>> >
>> > 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
>> > `Topology` that can be passed into KafakStreams.
>> >
>> > 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>> > rename the new class to `StreamsTopologyBuilder` (the name
>> > TopologyBuilder would actually be more natural, but would be easily
>> > confused with old low-level API TopologyBuilder).
>> >
>> > Thus, PAPI and DSL can be mixed-and-matched with full power, as
>> > StreamsTopologyBuilder return the created Topology via #build().
>> >
>> > I also removed `final` for both builder classes.
>> >
>> >
>> >
>> > With regard to the larger scope of the overal API redesign, I also want
>> > to point to a summary of API issues:
>> > https://cwiki.apache.org/confluence/display/KAFKA/
>> > Kafka+Streams+Discussions
>> >
>> > Thus, this KIP is only one building block of a larger improvement
>> > effort, and we hope to get as much as possible done for 0.11. If you
>> > have any API improvement ideas, please share them so we can come up with
>> > an holistic sound design (instead of uncoordinated local improvements
>> > that might diverge)
>> >
>> >
>> >
>> > Looking forward to your feedback on this KIP and the other API issues.
>> >
>> >
>> >
>> > -Matthias
>> >
>> >
>> >
>> >
>> > On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>> > > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>> matthias@confluent.io>
>> > > wrote:
>> > >
>> > >> - We also removed method #topologyBuilder() from KStreamBuilder
>> because
>> > >> we think #transform() should provide all functionality you need to
>> > >> mix-an-match Processor API and DSL. If there is any further concern
>> > >> about this, please let us know.
>> > >>
>> > >
>> > > Hi Matthias,
>> > >
>> > > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
>> > concerns
>> > > about this.  You're correct to point out that transform() can be used
>> for
>> > > some of the output situations I pointed out; albeit it seems somewhat
>> > > awkward to do so in a "transform" method; what do you do with the
>> retval?
>> > >
>> > > The PAPI processors I use in my KStreams app are all functioning on
>> > KTable
>> > > internals.  I wouldn't be able to convert them to
>> process()/transform().
>> > >
>> > > What's the harm in permitting both APIs to be used in the same
>> > application?
>> > >
>> > > Mathieu
>> > >
>> >
>> >
>>
>
>
>
> --
> -- Guozhang
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:

1. With regard to this class:

    public final class Source implements Node {
        public final String name;
        public final String topic; // can be topic name or pattern (as
String)
    }

Note that the source node could contain more than a single topic, i.e. a
list of topics besides a pattern.

2. With regard to

          public synchronized TopologyDescription describe();

My understand is that whenever the topology is modified, one needs to call
this function again to get a new description object, as the old one won't
be updated automatically. Hence the usage pattern would be:

TopologyDescription description = topology.describe();

topology.addProcessor(...)

description = topology.describe(); // have to call again

-----------

Is that right? If yes could you clarify this in the wiki?



Guozhang

On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll <mi...@confluent.io> wrote:

> Thanks for the update, Matthias.
>
> +1 to the points 1,2,3,4 you mentioned.
>
> Naming is always a tricky subject, but renaming KStreamBuilder
> to StreamsTopologyBuilder looks ok to me (I would have had a slight
> preference towards DslTopologyBuilder, but hey.)  The most important aspect
> is, IMHO, what you also pointed out:  to make it clear that the current
> KStreamBuilder actually builds a topology (though currently the latter is
> actually called `TopologyBuilder` currently), and not a `KStream`.
>
>
>
>
>
> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Hi,
> >
> > sorry for not replying earlier and thanks for all your feedback. After
> > some more discussions I updated the KIP. The new proposal puts some
> > other design considerations into account, that I want to highlight
> > shortly. Those considerations, automatically resolve the concerns raised.
> >
> > First some answers:
> >
> > > The PAPI processors I use in my KStreams app are all functioning on
> > KTable
> > > internals.  I wouldn't be able to convert them to
> process()/transform().
> > >
> > > What's the harm in permitting both APIs to be used in the same
> > application?
> >
> > It's not about "harm" but about design. We want to switch from a
> > "inheritance" to a "composition" pattern.
> >
> > About the interface idea: using a shared interface would not help to get
> > a composition pattern
> >
> >
> > Next I want to give the design considerations leading to the updated KIP:
> >
> > 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
> > KafkaStreams client executes a `Topology` and this execution should be
> > independent of the way the topology is "put together", ie, low-level API
> > or DSL.
> >
> > 2) Thus, we don't want to have any changes to KafkaStreams class.
> >
> > 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
> > `Topology` that can be passed into KafakStreams.
> >
> > 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> > rename the new class to `StreamsTopologyBuilder` (the name
> > TopologyBuilder would actually be more natural, but would be easily
> > confused with old low-level API TopologyBuilder).
> >
> > Thus, PAPI and DSL can be mixed-and-matched with full power, as
> > StreamsTopologyBuilder return the created Topology via #build().
> >
> > I also removed `final` for both builder classes.
> >
> >
> >
> > With regard to the larger scope of the overal API redesign, I also want
> > to point to a summary of API issues:
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams+Discussions
> >
> > Thus, this KIP is only one building block of a larger improvement
> > effort, and we hope to get as much as possible done for 0.11. If you
> > have any API improvement ideas, please share them so we can come up with
> > an holistic sound design (instead of uncoordinated local improvements
> > that might diverge)
> >
> >
> >
> > Looking forward to your feedback on this KIP and the other API issues.
> >
> >
> >
> > -Matthias
> >
> >
> >
> >
> > On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> > > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
> matthias@confluent.io>
> > > wrote:
> > >
> > >> - We also removed method #topologyBuilder() from KStreamBuilder
> because
> > >> we think #transform() should provide all functionality you need to
> > >> mix-an-match Processor API and DSL. If there is any further concern
> > >> about this, please let us know.
> > >>
> > >
> > > Hi Matthias,
> > >
> > > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> > concerns
> > > about this.  You're correct to point out that transform() can be used
> for
> > > some of the output situations I pointed out; albeit it seems somewhat
> > > awkward to do so in a "transform" method; what do you do with the
> retval?
> > >
> > > The PAPI processors I use in my KStreams app are all functioning on
> > KTable
> > > internals.  I wouldn't be able to convert them to
> process()/transform().
> > >
> > > What's the harm in permitting both APIs to be used in the same
> > application?
> > >
> > > Mathieu
> > >
> >
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Michael Noll <mi...@confluent.io>.
Thanks for the update, Matthias.

+1 to the points 1,2,3,4 you mentioned.

Naming is always a tricky subject, but renaming KStreamBuilder
to StreamsTopologyBuilder looks ok to me (I would have had a slight
preference towards DslTopologyBuilder, but hey.)  The most important aspect
is, IMHO, what you also pointed out:  to make it clear that the current
KStreamBuilder actually builds a topology (though currently the latter is
actually called `TopologyBuilder` currently), and not a `KStream`.





On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Hi,
>
> sorry for not replying earlier and thanks for all your feedback. After
> some more discussions I updated the KIP. The new proposal puts some
> other design considerations into account, that I want to highlight
> shortly. Those considerations, automatically resolve the concerns raised.
>
> First some answers:
>
> > The PAPI processors I use in my KStreams app are all functioning on
> KTable
> > internals.  I wouldn't be able to convert them to process()/transform().
> >
> > What's the harm in permitting both APIs to be used in the same
> application?
>
> It's not about "harm" but about design. We want to switch from a
> "inheritance" to a "composition" pattern.
>
> About the interface idea: using a shared interface would not help to get
> a composition pattern
>
>
> Next I want to give the design considerations leading to the updated KIP:
>
> 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
> KafkaStreams client executes a `Topology` and this execution should be
> independent of the way the topology is "put together", ie, low-level API
> or DSL.
>
> 2) Thus, we don't want to have any changes to KafkaStreams class.
>
> 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
> `Topology` that can be passed into KafakStreams.
>
> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> rename the new class to `StreamsTopologyBuilder` (the name
> TopologyBuilder would actually be more natural, but would be easily
> confused with old low-level API TopologyBuilder).
>
> Thus, PAPI and DSL can be mixed-and-matched with full power, as
> StreamsTopologyBuilder return the created Topology via #build().
>
> I also removed `final` for both builder classes.
>
>
>
> With regard to the larger scope of the overal API redesign, I also want
> to point to a summary of API issues:
> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Discussions
>
> Thus, this KIP is only one building block of a larger improvement
> effort, and we hope to get as much as possible done for 0.11. If you
> have any API improvement ideas, please share them so we can come up with
> an holistic sound design (instead of uncoordinated local improvements
> that might diverge)
>
>
>
> Looking forward to your feedback on this KIP and the other API issues.
>
>
>
> -Matthias
>
>
>
>
> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> - We also removed method #topologyBuilder() from KStreamBuilder because
> >> we think #transform() should provide all functionality you need to
> >> mix-an-match Processor API and DSL. If there is any further concern
> >> about this, please let us know.
> >>
> >
> > Hi Matthias,
> >
> > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> concerns
> > about this.  You're correct to point out that transform() can be used for
> > some of the output situations I pointed out; albeit it seems somewhat
> > awkward to do so in a "transform" method; what do you do with the retval?
> >
> > The PAPI processors I use in my KStreams app are all functioning on
> KTable
> > internals.  I wouldn't be able to convert them to process()/transform().
> >
> > What's the harm in permitting both APIs to be used in the same
> application?
> >
> > Mathieu
> >
>
>

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Posted by Michael Noll <mi...@confluent.io>.
Thanks for the update, Matthias.

+1 to the points 1,2,3,4 you mentioned.

Naming is always a tricky subject, but renaming KStreamBuilder
to StreamsTopologyBuilder looks ok to me (I would have had a slight
preference towards DslTopologyBuilder, but hey.)  The most important aspect
is, IMHO, what you also pointed out:  to make it clear that the current
KStreamBuilder actually builds a topology (though currently the latter is
actually called `TopologyBuilder` currently), and not a `KStream`.





On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Hi,
>
> sorry for not replying earlier and thanks for all your feedback. After
> some more discussions I updated the KIP. The new proposal puts some
> other design considerations into account, that I want to highlight
> shortly. Those considerations, automatically resolve the concerns raised.
>
> First some answers:
>
> > The PAPI processors I use in my KStreams app are all functioning on
> KTable
> > internals.  I wouldn't be able to convert them to process()/transform().
> >
> > What's the harm in permitting both APIs to be used in the same
> application?
>
> It's not about "harm" but about design. We want to switch from a
> "inheritance" to a "composition" pattern.
>
> About the interface idea: using a shared interface would not help to get
> a composition pattern
>
>
> Next I want to give the design considerations leading to the updated KIP:
>
> 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
> KafkaStreams client executes a `Topology` and this execution should be
> independent of the way the topology is "put together", ie, low-level API
> or DSL.
>
> 2) Thus, we don't want to have any changes to KafkaStreams class.
>
> 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
> `Topology` that can be passed into KafakStreams.
>
> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> rename the new class to `StreamsTopologyBuilder` (the name
> TopologyBuilder would actually be more natural, but would be easily
> confused with old low-level API TopologyBuilder).
>
> Thus, PAPI and DSL can be mixed-and-matched with full power, as
> StreamsTopologyBuilder return the created Topology via #build().
>
> I also removed `final` for both builder classes.
>
>
>
> With regard to the larger scope of the overal API redesign, I also want
> to point to a summary of API issues:
> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Discussions
>
> Thus, this KIP is only one building block of a larger improvement
> effort, and we hope to get as much as possible done for 0.11. If you
> have any API improvement ideas, please share them so we can come up with
> an holistic sound design (instead of uncoordinated local improvements
> that might diverge)
>
>
>
> Looking forward to your feedback on this KIP and the other API issues.
>
>
>
> -Matthias
>
>
>
>
> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> - We also removed method #topologyBuilder() from KStreamBuilder because
> >> we think #transform() should provide all functionality you need to
> >> mix-an-match Processor API and DSL. If there is any further concern
> >> about this, please let us know.
> >>
> >
> > Hi Matthias,
> >
> > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> concerns
> > about this.  You're correct to point out that transform() can be used for
> > some of the output situations I pointed out; albeit it seems somewhat
> > awkward to do so in a "transform" method; what do you do with the retval?
> >
> > The PAPI processors I use in my KStreams app are all functioning on
> KTable
> > internals.  I wouldn't be able to convert them to process()/transform().
> >
> > What's the harm in permitting both APIs to be used in the same
> application?
> >
> > Mathieu
> >
>
>