You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jukka Karvanen <ju...@jukinimi.com> on 2019/05/04 06:05:34 UTC

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Hi,

New TestInputTopic and TestOutputTopic included to Developer guide testing
page as alternative,
The old way with ConsumerRecordFactory and OutputVerifier is not removed.

You can see the proposal here in my branch:
https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics


I can create Work In progress pull request if that make commenting proposal
easier.
Still planning to add full coverage unit test and sample WordCountDemoTest to
streams/examples/src/test/java/org/apache/kafka/streams/examples/wordcount,
if this KIP is accepted.

Jukka


ti 30. huhtik. 2019 klo 13.59 Matthias J. Sax (matthias@confluent.io)
kirjoitti:

> KIP-451 was discarded in favor this this KIP. So it seems we are all on
> the same page.
>
>
> >> Relating to KIP-448.
> >> What kind of alignment did you think about?
>
> Nothing in particular. Was more or less a random though. Maybe there is
> nothing to be aligned. Just wanted to bring it up. :)
>
>
> >> Some thoughts after reading also the comments in KAFKA-6460:
> >> To my understand these KIP-448 mock classes need to be fed somehow into
> >> TopologyTestDriver.
> >> I don't know how those KIP-448 mock are planned to be set to
> >> TopologyTestDriver.
>
> KIP-448 is still quite early, and it's a little unclear... Maybe we
> should just ignore it for now. Sorry for the distraction with my comment
> about it.
>
>
> Please give me some more time to review this KIP in detail and I will
> follow up later again.
>
>
> -Matthias
>
> On 4/26/19 2:25 PM, Jukka Karvanen wrote:
> > Yes, my understanding was also that this KIP cover all the requirement of
> > KIP-451.
> >
> > Relating to KIP-448.
> > What kind of alignment did you think about?
> >
> > Some thoughts after reading also the comments in KAFKA-6460:
> > To my understand these KIP-448 mock classes need to be fed somehow into
> > TopologyTestDriver.
> > I don't know how those KIP-448 mock are planned to be set to
> > TopologyTestDriver.
> >
> > On contrast KIP-456 was planned to be on top of unmodified
> > TopologyTestDriver and now driver is set to TestInputTopic and
> > TestOutputTopic in constructor.
> > There are also alternative that these Topic object could be get from
> > TopologyTestDriver, but it would require the duplicating the constructors
> > of Topics as methods to
> > TopologyTestDriver.
> >
> > Also related to those Store object when going through the methods in
> > TopologyTestDriver I noticed accessing the state stores could be be the
> > third candidate for helper class or a group of classes.
> > So addition to have TestInputTopic and TestOutputTopic, it could be also
> > TestKeyValueStore, TestWindowStore, ... to follow the logic in this
> > KPI-456, but
> > it does add not any functionality on top of .already existing
> functionality
> > *Store classes. So that's why I did not include those.
> >
> > Jukka
> > -
> >
> >
> >
> >
> >
> > Not
> >
> > pe 26. huhtik. 2019 klo 12.03 Matthias J. Sax (matthias@confluent.io)
> > kirjoitti:
> >
> >> Btw: there is also KIP-448. I was wondering if it might be required or
> >> helpful to align the design of both with each other. Thoughts?
> >>
> >>
> >>
> >> On 4/25/19 11:22 PM, Matthias J. Sax wrote:
> >>> Thanks for the KIP!
> >>>
> >>> I was just comparing this KIP with KIP-451 (even if I did not yet make
> a
> >>> sorrow read over this KIP), and I agree that there is a big overlap. It
> >>> seems that KIP-456 might subsume KIP-451.
> >>>
> >>> Let's wait on Patrick's input to see how to proceed.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 4/25/19 12:03 AM, Jukka Karvanen wrote:
> >>>> Hi,
> >>>>
> >>>> If you want to see or test the my current idea of the implementation
> of
> >>>> this KIP, you can check it out in my repo:
> >>>>
> >>
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> >>>>
> >>>>
> >>>> After my test with KPI-451  I do not see need for add methods for
> >>>> Iterables, but waiting Patrick's clarification of the use case.
> >>>>
> >>>> Jukka
> >>>>
> >>>>
> >>>> ti 23. huhtik. 2019 klo 15.37 Jukka Karvanen (
> >> jukka.karvanen@jukinimi.com)
> >>>> kirjoitti:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> I would like to start the discussion on KIP-456: Helper classes to
> >> make it
> >>>>> simpler to write test logic with TopologyTestDriver:
> >>>>>
> >>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
> >>>>>
> >>>>>
> >>>>> There is also related KIP adding methods to TopologyTestDriver:
> >>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
> >>>>>
> >>>>>
> >>>>> I added those new Iterable based methods to this TestOutputTopic even
> >> not
> >>>>> tested those myself yet.
> >>>>> So this version contains both my original List and Map based methods
> >> and
> >>>>> these new one.
> >>>>> Based on the discussion some of these can be dropped, if those are
> >> seen as
> >>>>> redundant.
> >>>>>
> >>>>> Best Regards,
> >>>>> Jukka
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Jukka Karvanen <ju...@jukinimi.com>.
Hi All,

Inspired by the discussion in this thread, there is a new KIP-470:
TopologyTestDriver test input and output usability improvements:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-470%3A+TopologyTestDriver+test+input+and+output+usability+improvements


This KIP can be discarded if KIP-470 get accepted.

Even this KIP might be rejected, migrating from this classes to KIP-470 is
rather straighforward.
There are addon package which can be used with any Kafka version
>=1.1.0.before these are included to release.

See info:
https://github.com/jukkakarvanen/kafka-streams-test-topics

Maven package:
https://mvnrepository.com/artifact/com.github.jukkakarvanen/kafka-streams-test-topics


Best Regards,
Jukka

to 9. toukok. 2019 klo 15.51 Patrik Kleindl (pkleindl@gmail.com) kirjoitti:

> Hi Jukka
> Sorry, that was mostly what I had in mind, I didn't have enough time to
> look through the KIP.
>
> My question was also if this handling of topics wouldn't make more sense
> even outside the TTD, for the general API.
>
> regards
> Patrik
>
> On Thu, 9 May 2019 at 14:43, Jukka Karvanen <ju...@jukinimi.com>
> wrote:
>
> > Hi Patrick,
> >
> > Sorry, I need to clarify.
> > In this current version of KIP in wiki, topic object are created with
> > constructor where driver, topicName and serdes are provided.
> >
> > TestInputTopic<String, String> inputTopic = new TestInputTopic<String,
> > String>(testDriver, INPUT_TOPIC, new Serdes.StringSerde(), new
> > Serdes.StringSerde());
> >
> > So if TopologyTestDriver modified, this could be
> >
> > TestInputTopic<String, String> inputTopic =
> > testDriver.getInputTopic(INPUT_TOPIC, new Serdes.StringSerde(), new
> > Serdes.StringSerde());
> >
> > or preferrable if serders can be found:
> >
> > TestInputTopic<String, String> inputTopic =
> > testDriver.getInputTopic(INPUT_TOPIC);
> >
> > This initialization done normally in test setup and after it can be used
> > with topic object:
> >
> > inputTopic.pipeInput("Hello");
> >
> >
> > Or did you mean something else?
> >
> > Jukka
> >
> >
> >
> >
> > to 9. toukok. 2019 klo 15.14 Patrik Kleindl (pkleindl@gmail.com)
> > kirjoitti:
> >
> > > Hi Jukka
> > > Regarding your comment
> > > > If there would be a way to find out needed serders for the topic, it
> > > would make API even simpler.
> > > I was wondering if it wouldn't make more sense to have a "topic object"
> > > including the Serdes and use this instead of only passing in the name
> as
> > a
> > > string everywhere.
> > > From a low-level perspective Kafka does and should not care what is
> > inside
> > > the topic, but from a user perspective this information usually belongs
> > > together.
> > > Sidenote: Having topics as objects would probably also make it easier
> to
> > > track them from the outside.
> > > regards
> > > Patrik
> > >
> > > On Thu, 9 May 2019 at 10:39, Jukka Karvanen <
> jukka.karvanen@jukinimi.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I will write new KIP for the TestTopologyDriver Input and Output
> > > usability
> > > > changes.
> > > > It is out of the scope of the current title: "Helper classes to make
> it
> > > > simpler to write test logic with TopologyTestDriver"
> > > > and we can get back to this KIP if that alternative is not favored.
> > > >
> > > > So my original approach was not to modify existing classes, but if we
> > end
> > > > up modifing TTD, I would also change the
> > > > way to instantiate these topics. We could add
> > getInputTopic("my-topic") /
> > > > getOutputTopic("my-topic") to TTD, so it would work
> > > > same way as with getStateStore and related methods.
> > > >
> > > > If there would be a way to find out needed serders for the topic, it
> > > would
> > > > make API even simpler.
> > > >
> > > > Generally still as a end user, I would prefer not only swapping the
> > > > ConsumerRecord and ProducerRecord, but having
> > > > interface accepting and returning Record, not needing to think about
> > are
> > > > those ConsumerRecord or ProducerRecords.
> > > > and that way would could use same classes to pipe in and assert the
> > > > result.Something similar than  "private final static class Record"
> > > > in TopologyTestDriverTest.
> > > >
> > > > Jukka
> > > >
> > > > ke 8. toukok. 2019 klo 17.01 John Roesler (john@confluent.io)
> > kirjoitti:
> > > >
> > > > > Hi Jukka, thanks for the reply!
> > > > >
> > > > > I think this is a good summary (the discussion was getting a little
> > > > > unwieldy. I'll reply inline.
> > > > >
> > > > > Also, thanks for clarify about your library vs. this KIP. That
> makes
> > > > > perfect sense to me.
> > > > > >
> > > > > > 1. Add JavaDoc for KIP
> > > > > >
> > > > > > Is there a good example of KIP where Javadoc is included, so I
> can
> > > > > follow?
> > > > > > I create this KIP based on this as an example::
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> > > > > >
> > > > > >
> > > > > > Now added some comments to KIP page to clarify timestamp
> handling,
> > > but
> > > > I
> > > > > > did not want to add full JavaDoc of each methods.
> > > > > > Is that enough?
> > > > >
> > > > > That's fine. I was just trying to make the review process more
> > > > > efficient for other reviewers (which makes getting your KIP
> accepted
> > > > > more efficient). I reviewed a few recent KIPs, and, indeed, I see
> > that
> > > > > javadocs are not actually as common as I thought.
> > > > >
> > > > > > 2. TTD usability changes and swapping ConsumerRecord and
> > > ProducerRecord
> > > > > in
> > > > > > APIs
> > > > > >
> > > > > > To my point of view only:
> > > > > > - changing readRecord to return ConsumerRecord would cause we
> > cannot
> > > > use
> > > > > > OutputVerifier
> > > > >
> > > > > Yes, we'd likely have to provide new methods in OutputVerifier to
> > work
> > > > > with ConsumerRecord. If you buy into the plan of deprecating most
> of
> > > > > the current-style interactions, this wouldn't be that confusing,
> > since
> > > > > all the ProducerRecord verifications would be deprecated, and only
> > the
> > > > > ConsumerRecord verifications would remain "live".
> > > > >
> > > > > > - changing pipeInput to take in ProducerRecord, but not providing
> > > easy
> > > > > way
> > > > > > to contruct those like ConsumerRecordFactory
> > > > >
> > > > > I didn't follow this as well. The ConsumerRecordFactory is there
> > > > > because it's a pain to construct ConsumerRecords. Conversely,
> > > > > ProducerRecord has many convenience constructors, so we wouldn't
> need
> > > > > a factory at all. This is a net win for users, since there's less
> > > > > surface area for them to deal with. Under my proposal, we'd
> deprecate
> > > > > the whole ConsumerRecordFactory.
> > > > >
> > > > > Note that there's an "idea parity check" here: ConsumerRecords are
> > > > > hard to construct because developers aren't meant to ever construct
> > > > > them. They are meant to construct ProducerRecords, which is why
> it's
> > > > > made easy. TTD has inverted the relationships of these classes,
> which
> > > > > is why the ConsumerRecordFactory is necessary, but if we correct
> it,
> > > > > and return to a "normal" interaction with the Client library, then
> we
> > > > > don't need special support classes.
> > > > >
> > > > > > - if initializing ConsumerRecord to/from  ProducerRecord  in
> these
> > > > > classes
> > > > > > field by field contructor, there are risk new fields are not
> added
> > to
> > > > > this
> > > > > > classes if there are changes in ProducerRecord or ConsumerRecord
> > > > >
> > > > > This risk seems pretty low, to be honest. We will have tests that
> > > > > exercise this testing framework, so if anyone changes
> ProducerRecord
> > > > > or ConsumerRecord, our tests will break. Since both libraries are
> > > > > build together, the problem would be fixed before the change is
> ever
> > > > > merged to trunk.
> > > > >
> > > > > > I would propose a separate KIP for these and probably other
> > > > > enhanchements:
> > > > > > -superclass or common interface for ConsumerRecord and
> > ProducerRecord
> > > > > > -contructors to ConsumerRecord and ProducerRecord to initialize
> > with
> > > > this
> > > > > > superclass
> > > > > > -modify OutputVerifier to work with both ConsumerRecord and
> > > > > ProducerRecord
> > > > > > -create new RecordFactory to replace ConsumerRecordFactory
> > > > >
> > > > > I understand your motivation to control the scope of this change,
> but
> > > > > I actually think that it's better for user-facing design changes to
> > > > > occur in fewer, bigger, chunks, rather than many small changes.
> > People
> > > > > will get fatigued if multiple releases in a row change the
> > > > > test-support library from under their feet. Better to do it in one
> > > > > shot.
> > > > >
> > > > > Plus, this is a design discussion. We need to include the whole
> scope
> > > > > of the system in the design, or we may realize in Phase 3 that
> there
> > > > > was some design error in Phase 1, since we were only designing
> > > > > locally. This doesn't mean that we only need one Jira ticket, there
> > > > > can be many in support of this KIP, or that we only need one PR,
> it's
> > > > > certainly better to send multiple small PRs to decrease risk and
> ease
> > > > > reviews. But the design discussion doesn't need to be fragmented
> > > > > similarly.
> > > > >
> > > > > > 3. return null vs NoSuchElementException when empty queue
> > > > > >
> > > > > > Should this be also included to the above TTD usability changes?
> > > > > > If single row read methods is changed to throw expectiong, it
> would
> > > > > require
> > > > > > addition of hasRecords to able to verified the empty queue
> > scenarios.
> > > > > > I do not know how to implement it currently without modifying TTD
> > to
> > > > > > provide some kind way to get the queue size or peak items.
> > > > >
> > > > > Yes, it's absolutely within bounds to propose changes to TTD to
> > > > > support the ergonomic API you're proposing.
> > > > >
> > > > > > 4. IllegalArgumentException("topic doesn't exist")
> > > > > > Is this worth separate ticket?
> > > > >
> > > > > This is your call. That was just an idea in response to your
> > > experience.
> > > > >
> > > > > > 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
> > > > > >
> > > > > > I was thinking org.apache.kafka.streams.test where also
> > > OutputVerifier
> > > > > and
> > > > > > ConsumerRecordFactory exist would be more logical place, but
> > > > > > I do not know is there some technical reasons why TTD are in
> > > > > > org.apache.kafka.streams, not in org.apache.kafka.streams.test
> > where
> > > > > other
> > > > > > classes are.
> > > > > >
> > > > > > Did I skip something?
> > > > >
> > > > > Ah, no, you're right. I'm not sure why that is. I admit it's
> > > > > confusing. I don't think the package matters *that* much, just keep
> > it
> > > > > wherever you think is appropriate.
> > > > >
> > > > >
> > > > > That's all! Thanks for entertaining my thoughts.
> > > > > -John
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Patrik Kleindl <pk...@gmail.com>.
Hi Jukka
Sorry, that was mostly what I had in mind, I didn't have enough time to
look through the KIP.

My question was also if this handling of topics wouldn't make more sense
even outside the TTD, for the general API.

regards
Patrik

On Thu, 9 May 2019 at 14:43, Jukka Karvanen <ju...@jukinimi.com>
wrote:

> Hi Patrick,
>
> Sorry, I need to clarify.
> In this current version of KIP in wiki, topic object are created with
> constructor where driver, topicName and serdes are provided.
>
> TestInputTopic<String, String> inputTopic = new TestInputTopic<String,
> String>(testDriver, INPUT_TOPIC, new Serdes.StringSerde(), new
> Serdes.StringSerde());
>
> So if TopologyTestDriver modified, this could be
>
> TestInputTopic<String, String> inputTopic =
> testDriver.getInputTopic(INPUT_TOPIC, new Serdes.StringSerde(), new
> Serdes.StringSerde());
>
> or preferrable if serders can be found:
>
> TestInputTopic<String, String> inputTopic =
> testDriver.getInputTopic(INPUT_TOPIC);
>
> This initialization done normally in test setup and after it can be used
> with topic object:
>
> inputTopic.pipeInput("Hello");
>
>
> Or did you mean something else?
>
> Jukka
>
>
>
>
> to 9. toukok. 2019 klo 15.14 Patrik Kleindl (pkleindl@gmail.com)
> kirjoitti:
>
> > Hi Jukka
> > Regarding your comment
> > > If there would be a way to find out needed serders for the topic, it
> > would make API even simpler.
> > I was wondering if it wouldn't make more sense to have a "topic object"
> > including the Serdes and use this instead of only passing in the name as
> a
> > string everywhere.
> > From a low-level perspective Kafka does and should not care what is
> inside
> > the topic, but from a user perspective this information usually belongs
> > together.
> > Sidenote: Having topics as objects would probably also make it easier to
> > track them from the outside.
> > regards
> > Patrik
> >
> > On Thu, 9 May 2019 at 10:39, Jukka Karvanen <jukka.karvanen@jukinimi.com
> >
> > wrote:
> >
> > > Hi,
> > >
> > > I will write new KIP for the TestTopologyDriver Input and Output
> > usability
> > > changes.
> > > It is out of the scope of the current title: "Helper classes to make it
> > > simpler to write test logic with TopologyTestDriver"
> > > and we can get back to this KIP if that alternative is not favored.
> > >
> > > So my original approach was not to modify existing classes, but if we
> end
> > > up modifing TTD, I would also change the
> > > way to instantiate these topics. We could add
> getInputTopic("my-topic") /
> > > getOutputTopic("my-topic") to TTD, so it would work
> > > same way as with getStateStore and related methods.
> > >
> > > If there would be a way to find out needed serders for the topic, it
> > would
> > > make API even simpler.
> > >
> > > Generally still as a end user, I would prefer not only swapping the
> > > ConsumerRecord and ProducerRecord, but having
> > > interface accepting and returning Record, not needing to think about
> are
> > > those ConsumerRecord or ProducerRecords.
> > > and that way would could use same classes to pipe in and assert the
> > > result.Something similar than  "private final static class Record"
> > > in TopologyTestDriverTest.
> > >
> > > Jukka
> > >
> > > ke 8. toukok. 2019 klo 17.01 John Roesler (john@confluent.io)
> kirjoitti:
> > >
> > > > Hi Jukka, thanks for the reply!
> > > >
> > > > I think this is a good summary (the discussion was getting a little
> > > > unwieldy. I'll reply inline.
> > > >
> > > > Also, thanks for clarify about your library vs. this KIP. That makes
> > > > perfect sense to me.
> > > > >
> > > > > 1. Add JavaDoc for KIP
> > > > >
> > > > > Is there a good example of KIP where Javadoc is included, so I can
> > > > follow?
> > > > > I create this KIP based on this as an example::
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> > > > >
> > > > >
> > > > > Now added some comments to KIP page to clarify timestamp handling,
> > but
> > > I
> > > > > did not want to add full JavaDoc of each methods.
> > > > > Is that enough?
> > > >
> > > > That's fine. I was just trying to make the review process more
> > > > efficient for other reviewers (which makes getting your KIP accepted
> > > > more efficient). I reviewed a few recent KIPs, and, indeed, I see
> that
> > > > javadocs are not actually as common as I thought.
> > > >
> > > > > 2. TTD usability changes and swapping ConsumerRecord and
> > ProducerRecord
> > > > in
> > > > > APIs
> > > > >
> > > > > To my point of view only:
> > > > > - changing readRecord to return ConsumerRecord would cause we
> cannot
> > > use
> > > > > OutputVerifier
> > > >
> > > > Yes, we'd likely have to provide new methods in OutputVerifier to
> work
> > > > with ConsumerRecord. If you buy into the plan of deprecating most of
> > > > the current-style interactions, this wouldn't be that confusing,
> since
> > > > all the ProducerRecord verifications would be deprecated, and only
> the
> > > > ConsumerRecord verifications would remain "live".
> > > >
> > > > > - changing pipeInput to take in ProducerRecord, but not providing
> > easy
> > > > way
> > > > > to contruct those like ConsumerRecordFactory
> > > >
> > > > I didn't follow this as well. The ConsumerRecordFactory is there
> > > > because it's a pain to construct ConsumerRecords. Conversely,
> > > > ProducerRecord has many convenience constructors, so we wouldn't need
> > > > a factory at all. This is a net win for users, since there's less
> > > > surface area for them to deal with. Under my proposal, we'd deprecate
> > > > the whole ConsumerRecordFactory.
> > > >
> > > > Note that there's an "idea parity check" here: ConsumerRecords are
> > > > hard to construct because developers aren't meant to ever construct
> > > > them. They are meant to construct ProducerRecords, which is why it's
> > > > made easy. TTD has inverted the relationships of these classes, which
> > > > is why the ConsumerRecordFactory is necessary, but if we correct it,
> > > > and return to a "normal" interaction with the Client library, then we
> > > > don't need special support classes.
> > > >
> > > > > - if initializing ConsumerRecord to/from  ProducerRecord  in these
> > > > classes
> > > > > field by field contructor, there are risk new fields are not added
> to
> > > > this
> > > > > classes if there are changes in ProducerRecord or ConsumerRecord
> > > >
> > > > This risk seems pretty low, to be honest. We will have tests that
> > > > exercise this testing framework, so if anyone changes ProducerRecord
> > > > or ConsumerRecord, our tests will break. Since both libraries are
> > > > build together, the problem would be fixed before the change is ever
> > > > merged to trunk.
> > > >
> > > > > I would propose a separate KIP for these and probably other
> > > > enhanchements:
> > > > > -superclass or common interface for ConsumerRecord and
> ProducerRecord
> > > > > -contructors to ConsumerRecord and ProducerRecord to initialize
> with
> > > this
> > > > > superclass
> > > > > -modify OutputVerifier to work with both ConsumerRecord and
> > > > ProducerRecord
> > > > > -create new RecordFactory to replace ConsumerRecordFactory
> > > >
> > > > I understand your motivation to control the scope of this change, but
> > > > I actually think that it's better for user-facing design changes to
> > > > occur in fewer, bigger, chunks, rather than many small changes.
> People
> > > > will get fatigued if multiple releases in a row change the
> > > > test-support library from under their feet. Better to do it in one
> > > > shot.
> > > >
> > > > Plus, this is a design discussion. We need to include the whole scope
> > > > of the system in the design, or we may realize in Phase 3 that there
> > > > was some design error in Phase 1, since we were only designing
> > > > locally. This doesn't mean that we only need one Jira ticket, there
> > > > can be many in support of this KIP, or that we only need one PR, it's
> > > > certainly better to send multiple small PRs to decrease risk and ease
> > > > reviews. But the design discussion doesn't need to be fragmented
> > > > similarly.
> > > >
> > > > > 3. return null vs NoSuchElementException when empty queue
> > > > >
> > > > > Should this be also included to the above TTD usability changes?
> > > > > If single row read methods is changed to throw expectiong, it would
> > > > require
> > > > > addition of hasRecords to able to verified the empty queue
> scenarios.
> > > > > I do not know how to implement it currently without modifying TTD
> to
> > > > > provide some kind way to get the queue size or peak items.
> > > >
> > > > Yes, it's absolutely within bounds to propose changes to TTD to
> > > > support the ergonomic API you're proposing.
> > > >
> > > > > 4. IllegalArgumentException("topic doesn't exist")
> > > > > Is this worth separate ticket?
> > > >
> > > > This is your call. That was just an idea in response to your
> > experience.
> > > >
> > > > > 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
> > > > >
> > > > > I was thinking org.apache.kafka.streams.test where also
> > OutputVerifier
> > > > and
> > > > > ConsumerRecordFactory exist would be more logical place, but
> > > > > I do not know is there some technical reasons why TTD are in
> > > > > org.apache.kafka.streams, not in org.apache.kafka.streams.test
> where
> > > > other
> > > > > classes are.
> > > > >
> > > > > Did I skip something?
> > > >
> > > > Ah, no, you're right. I'm not sure why that is. I admit it's
> > > > confusing. I don't think the package matters *that* much, just keep
> it
> > > > wherever you think is appropriate.
> > > >
> > > >
> > > > That's all! Thanks for entertaining my thoughts.
> > > > -John
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Jukka Karvanen <ju...@jukinimi.com>.
Hi Patrick,

Sorry, I need to clarify.
In this current version of KIP in wiki, topic object are created with
constructor where driver, topicName and serdes are provided.

TestInputTopic<String, String> inputTopic = new TestInputTopic<String,
String>(testDriver, INPUT_TOPIC, new Serdes.StringSerde(), new
Serdes.StringSerde());

So if TopologyTestDriver modified, this could be

TestInputTopic<String, String> inputTopic =
testDriver.getInputTopic(INPUT_TOPIC, new Serdes.StringSerde(), new
Serdes.StringSerde());

or preferrable if serders can be found:

TestInputTopic<String, String> inputTopic =
testDriver.getInputTopic(INPUT_TOPIC);

This initialization done normally in test setup and after it can be used
with topic object:

inputTopic.pipeInput("Hello");


Or did you mean something else?

Jukka




to 9. toukok. 2019 klo 15.14 Patrik Kleindl (pkleindl@gmail.com) kirjoitti:

> Hi Jukka
> Regarding your comment
> > If there would be a way to find out needed serders for the topic, it
> would make API even simpler.
> I was wondering if it wouldn't make more sense to have a "topic object"
> including the Serdes and use this instead of only passing in the name as a
> string everywhere.
> From a low-level perspective Kafka does and should not care what is inside
> the topic, but from a user perspective this information usually belongs
> together.
> Sidenote: Having topics as objects would probably also make it easier to
> track them from the outside.
> regards
> Patrik
>
> On Thu, 9 May 2019 at 10:39, Jukka Karvanen <ju...@jukinimi.com>
> wrote:
>
> > Hi,
> >
> > I will write new KIP for the TestTopologyDriver Input and Output
> usability
> > changes.
> > It is out of the scope of the current title: "Helper classes to make it
> > simpler to write test logic with TopologyTestDriver"
> > and we can get back to this KIP if that alternative is not favored.
> >
> > So my original approach was not to modify existing classes, but if we end
> > up modifing TTD, I would also change the
> > way to instantiate these topics. We could add getInputTopic("my-topic") /
> > getOutputTopic("my-topic") to TTD, so it would work
> > same way as with getStateStore and related methods.
> >
> > If there would be a way to find out needed serders for the topic, it
> would
> > make API even simpler.
> >
> > Generally still as a end user, I would prefer not only swapping the
> > ConsumerRecord and ProducerRecord, but having
> > interface accepting and returning Record, not needing to think about are
> > those ConsumerRecord or ProducerRecords.
> > and that way would could use same classes to pipe in and assert the
> > result.Something similar than  "private final static class Record"
> > in TopologyTestDriverTest.
> >
> > Jukka
> >
> > ke 8. toukok. 2019 klo 17.01 John Roesler (john@confluent.io) kirjoitti:
> >
> > > Hi Jukka, thanks for the reply!
> > >
> > > I think this is a good summary (the discussion was getting a little
> > > unwieldy. I'll reply inline.
> > >
> > > Also, thanks for clarify about your library vs. this KIP. That makes
> > > perfect sense to me.
> > > >
> > > > 1. Add JavaDoc for KIP
> > > >
> > > > Is there a good example of KIP where Javadoc is included, so I can
> > > follow?
> > > > I create this KIP based on this as an example::
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> > > >
> > > >
> > > > Now added some comments to KIP page to clarify timestamp handling,
> but
> > I
> > > > did not want to add full JavaDoc of each methods.
> > > > Is that enough?
> > >
> > > That's fine. I was just trying to make the review process more
> > > efficient for other reviewers (which makes getting your KIP accepted
> > > more efficient). I reviewed a few recent KIPs, and, indeed, I see that
> > > javadocs are not actually as common as I thought.
> > >
> > > > 2. TTD usability changes and swapping ConsumerRecord and
> ProducerRecord
> > > in
> > > > APIs
> > > >
> > > > To my point of view only:
> > > > - changing readRecord to return ConsumerRecord would cause we cannot
> > use
> > > > OutputVerifier
> > >
> > > Yes, we'd likely have to provide new methods in OutputVerifier to work
> > > with ConsumerRecord. If you buy into the plan of deprecating most of
> > > the current-style interactions, this wouldn't be that confusing, since
> > > all the ProducerRecord verifications would be deprecated, and only the
> > > ConsumerRecord verifications would remain "live".
> > >
> > > > - changing pipeInput to take in ProducerRecord, but not providing
> easy
> > > way
> > > > to contruct those like ConsumerRecordFactory
> > >
> > > I didn't follow this as well. The ConsumerRecordFactory is there
> > > because it's a pain to construct ConsumerRecords. Conversely,
> > > ProducerRecord has many convenience constructors, so we wouldn't need
> > > a factory at all. This is a net win for users, since there's less
> > > surface area for them to deal with. Under my proposal, we'd deprecate
> > > the whole ConsumerRecordFactory.
> > >
> > > Note that there's an "idea parity check" here: ConsumerRecords are
> > > hard to construct because developers aren't meant to ever construct
> > > them. They are meant to construct ProducerRecords, which is why it's
> > > made easy. TTD has inverted the relationships of these classes, which
> > > is why the ConsumerRecordFactory is necessary, but if we correct it,
> > > and return to a "normal" interaction with the Client library, then we
> > > don't need special support classes.
> > >
> > > > - if initializing ConsumerRecord to/from  ProducerRecord  in these
> > > classes
> > > > field by field contructor, there are risk new fields are not added to
> > > this
> > > > classes if there are changes in ProducerRecord or ConsumerRecord
> > >
> > > This risk seems pretty low, to be honest. We will have tests that
> > > exercise this testing framework, so if anyone changes ProducerRecord
> > > or ConsumerRecord, our tests will break. Since both libraries are
> > > build together, the problem would be fixed before the change is ever
> > > merged to trunk.
> > >
> > > > I would propose a separate KIP for these and probably other
> > > enhanchements:
> > > > -superclass or common interface for ConsumerRecord and ProducerRecord
> > > > -contructors to ConsumerRecord and ProducerRecord to initialize with
> > this
> > > > superclass
> > > > -modify OutputVerifier to work with both ConsumerRecord and
> > > ProducerRecord
> > > > -create new RecordFactory to replace ConsumerRecordFactory
> > >
> > > I understand your motivation to control the scope of this change, but
> > > I actually think that it's better for user-facing design changes to
> > > occur in fewer, bigger, chunks, rather than many small changes. People
> > > will get fatigued if multiple releases in a row change the
> > > test-support library from under their feet. Better to do it in one
> > > shot.
> > >
> > > Plus, this is a design discussion. We need to include the whole scope
> > > of the system in the design, or we may realize in Phase 3 that there
> > > was some design error in Phase 1, since we were only designing
> > > locally. This doesn't mean that we only need one Jira ticket, there
> > > can be many in support of this KIP, or that we only need one PR, it's
> > > certainly better to send multiple small PRs to decrease risk and ease
> > > reviews. But the design discussion doesn't need to be fragmented
> > > similarly.
> > >
> > > > 3. return null vs NoSuchElementException when empty queue
> > > >
> > > > Should this be also included to the above TTD usability changes?
> > > > If single row read methods is changed to throw expectiong, it would
> > > require
> > > > addition of hasRecords to able to verified the empty queue scenarios.
> > > > I do not know how to implement it currently without modifying TTD to
> > > > provide some kind way to get the queue size or peak items.
> > >
> > > Yes, it's absolutely within bounds to propose changes to TTD to
> > > support the ergonomic API you're proposing.
> > >
> > > > 4. IllegalArgumentException("topic doesn't exist")
> > > > Is this worth separate ticket?
> > >
> > > This is your call. That was just an idea in response to your
> experience.
> > >
> > > > 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
> > > >
> > > > I was thinking org.apache.kafka.streams.test where also
> OutputVerifier
> > > and
> > > > ConsumerRecordFactory exist would be more logical place, but
> > > > I do not know is there some technical reasons why TTD are in
> > > > org.apache.kafka.streams, not in org.apache.kafka.streams.test where
> > > other
> > > > classes are.
> > > >
> > > > Did I skip something?
> > >
> > > Ah, no, you're right. I'm not sure why that is. I admit it's
> > > confusing. I don't think the package matters *that* much, just keep it
> > > wherever you think is appropriate.
> > >
> > >
> > > That's all! Thanks for entertaining my thoughts.
> > > -John
> > >
> >
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Patrik Kleindl <pk...@gmail.com>.
Hi Jukka
Regarding your comment
> If there would be a way to find out needed serders for the topic, it
would make API even simpler.
I was wondering if it wouldn't make more sense to have a "topic object"
including the Serdes and use this instead of only passing in the name as a
string everywhere.
From a low-level perspective Kafka does and should not care what is inside
the topic, but from a user perspective this information usually belongs
together.
Sidenote: Having topics as objects would probably also make it easier to
track them from the outside.
regards
Patrik

On Thu, 9 May 2019 at 10:39, Jukka Karvanen <ju...@jukinimi.com>
wrote:

> Hi,
>
> I will write new KIP for the TestTopologyDriver Input and Output usability
> changes.
> It is out of the scope of the current title: "Helper classes to make it
> simpler to write test logic with TopologyTestDriver"
> and we can get back to this KIP if that alternative is not favored.
>
> So my original approach was not to modify existing classes, but if we end
> up modifing TTD, I would also change the
> way to instantiate these topics. We could add getInputTopic("my-topic") /
> getOutputTopic("my-topic") to TTD, so it would work
> same way as with getStateStore and related methods.
>
> If there would be a way to find out needed serders for the topic, it would
> make API even simpler.
>
> Generally still as a end user, I would prefer not only swapping the
> ConsumerRecord and ProducerRecord, but having
> interface accepting and returning Record, not needing to think about are
> those ConsumerRecord or ProducerRecords.
> and that way would could use same classes to pipe in and assert the
> result.Something similar than  "private final static class Record"
> in TopologyTestDriverTest.
>
> Jukka
>
> ke 8. toukok. 2019 klo 17.01 John Roesler (john@confluent.io) kirjoitti:
>
> > Hi Jukka, thanks for the reply!
> >
> > I think this is a good summary (the discussion was getting a little
> > unwieldy. I'll reply inline.
> >
> > Also, thanks for clarify about your library vs. this KIP. That makes
> > perfect sense to me.
> > >
> > > 1. Add JavaDoc for KIP
> > >
> > > Is there a good example of KIP where Javadoc is included, so I can
> > follow?
> > > I create this KIP based on this as an example::
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> > >
> > >
> > > Now added some comments to KIP page to clarify timestamp handling, but
> I
> > > did not want to add full JavaDoc of each methods.
> > > Is that enough?
> >
> > That's fine. I was just trying to make the review process more
> > efficient for other reviewers (which makes getting your KIP accepted
> > more efficient). I reviewed a few recent KIPs, and, indeed, I see that
> > javadocs are not actually as common as I thought.
> >
> > > 2. TTD usability changes and swapping ConsumerRecord and ProducerRecord
> > in
> > > APIs
> > >
> > > To my point of view only:
> > > - changing readRecord to return ConsumerRecord would cause we cannot
> use
> > > OutputVerifier
> >
> > Yes, we'd likely have to provide new methods in OutputVerifier to work
> > with ConsumerRecord. If you buy into the plan of deprecating most of
> > the current-style interactions, this wouldn't be that confusing, since
> > all the ProducerRecord verifications would be deprecated, and only the
> > ConsumerRecord verifications would remain "live".
> >
> > > - changing pipeInput to take in ProducerRecord, but not providing easy
> > way
> > > to contruct those like ConsumerRecordFactory
> >
> > I didn't follow this as well. The ConsumerRecordFactory is there
> > because it's a pain to construct ConsumerRecords. Conversely,
> > ProducerRecord has many convenience constructors, so we wouldn't need
> > a factory at all. This is a net win for users, since there's less
> > surface area for them to deal with. Under my proposal, we'd deprecate
> > the whole ConsumerRecordFactory.
> >
> > Note that there's an "idea parity check" here: ConsumerRecords are
> > hard to construct because developers aren't meant to ever construct
> > them. They are meant to construct ProducerRecords, which is why it's
> > made easy. TTD has inverted the relationships of these classes, which
> > is why the ConsumerRecordFactory is necessary, but if we correct it,
> > and return to a "normal" interaction with the Client library, then we
> > don't need special support classes.
> >
> > > - if initializing ConsumerRecord to/from  ProducerRecord  in these
> > classes
> > > field by field contructor, there are risk new fields are not added to
> > this
> > > classes if there are changes in ProducerRecord or ConsumerRecord
> >
> > This risk seems pretty low, to be honest. We will have tests that
> > exercise this testing framework, so if anyone changes ProducerRecord
> > or ConsumerRecord, our tests will break. Since both libraries are
> > build together, the problem would be fixed before the change is ever
> > merged to trunk.
> >
> > > I would propose a separate KIP for these and probably other
> > enhanchements:
> > > -superclass or common interface for ConsumerRecord and ProducerRecord
> > > -contructors to ConsumerRecord and ProducerRecord to initialize with
> this
> > > superclass
> > > -modify OutputVerifier to work with both ConsumerRecord and
> > ProducerRecord
> > > -create new RecordFactory to replace ConsumerRecordFactory
> >
> > I understand your motivation to control the scope of this change, but
> > I actually think that it's better for user-facing design changes to
> > occur in fewer, bigger, chunks, rather than many small changes. People
> > will get fatigued if multiple releases in a row change the
> > test-support library from under their feet. Better to do it in one
> > shot.
> >
> > Plus, this is a design discussion. We need to include the whole scope
> > of the system in the design, or we may realize in Phase 3 that there
> > was some design error in Phase 1, since we were only designing
> > locally. This doesn't mean that we only need one Jira ticket, there
> > can be many in support of this KIP, or that we only need one PR, it's
> > certainly better to send multiple small PRs to decrease risk and ease
> > reviews. But the design discussion doesn't need to be fragmented
> > similarly.
> >
> > > 3. return null vs NoSuchElementException when empty queue
> > >
> > > Should this be also included to the above TTD usability changes?
> > > If single row read methods is changed to throw expectiong, it would
> > require
> > > addition of hasRecords to able to verified the empty queue scenarios.
> > > I do not know how to implement it currently without modifying TTD to
> > > provide some kind way to get the queue size or peak items.
> >
> > Yes, it's absolutely within bounds to propose changes to TTD to
> > support the ergonomic API you're proposing.
> >
> > > 4. IllegalArgumentException("topic doesn't exist")
> > > Is this worth separate ticket?
> >
> > This is your call. That was just an idea in response to your experience.
> >
> > > 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
> > >
> > > I was thinking org.apache.kafka.streams.test where also OutputVerifier
> > and
> > > ConsumerRecordFactory exist would be more logical place, but
> > > I do not know is there some technical reasons why TTD are in
> > > org.apache.kafka.streams, not in org.apache.kafka.streams.test where
> > other
> > > classes are.
> > >
> > > Did I skip something?
> >
> > Ah, no, you're right. I'm not sure why that is. I admit it's
> > confusing. I don't think the package matters *that* much, just keep it
> > wherever you think is appropriate.
> >
> >
> > That's all! Thanks for entertaining my thoughts.
> > -John
> >
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Jukka Karvanen <ju...@jukinimi.com>.
Hi,

I will write new KIP for the TestTopologyDriver Input and Output usability
changes.
It is out of the scope of the current title: "Helper classes to make it
simpler to write test logic with TopologyTestDriver"
and we can get back to this KIP if that alternative is not favored.

So my original approach was not to modify existing classes, but if we end
up modifing TTD, I would also change the
way to instantiate these topics. We could add getInputTopic("my-topic") /
getOutputTopic("my-topic") to TTD, so it would work
same way as with getStateStore and related methods.

If there would be a way to find out needed serders for the topic, it would
make API even simpler.

Generally still as a end user, I would prefer not only swapping the
ConsumerRecord and ProducerRecord, but having
interface accepting and returning Record, not needing to think about are
those ConsumerRecord or ProducerRecords.
and that way would could use same classes to pipe in and assert the
result.Something similar than  "private final static class Record"
in TopologyTestDriverTest.

Jukka

ke 8. toukok. 2019 klo 17.01 John Roesler (john@confluent.io) kirjoitti:

> Hi Jukka, thanks for the reply!
>
> I think this is a good summary (the discussion was getting a little
> unwieldy. I'll reply inline.
>
> Also, thanks for clarify about your library vs. this KIP. That makes
> perfect sense to me.
> >
> > 1. Add JavaDoc for KIP
> >
> > Is there a good example of KIP where Javadoc is included, so I can
> follow?
> > I create this KIP based on this as an example::
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
> >
> >
> > Now added some comments to KIP page to clarify timestamp handling, but I
> > did not want to add full JavaDoc of each methods.
> > Is that enough?
>
> That's fine. I was just trying to make the review process more
> efficient for other reviewers (which makes getting your KIP accepted
> more efficient). I reviewed a few recent KIPs, and, indeed, I see that
> javadocs are not actually as common as I thought.
>
> > 2. TTD usability changes and swapping ConsumerRecord and ProducerRecord
> in
> > APIs
> >
> > To my point of view only:
> > - changing readRecord to return ConsumerRecord would cause we cannot use
> > OutputVerifier
>
> Yes, we'd likely have to provide new methods in OutputVerifier to work
> with ConsumerRecord. If you buy into the plan of deprecating most of
> the current-style interactions, this wouldn't be that confusing, since
> all the ProducerRecord verifications would be deprecated, and only the
> ConsumerRecord verifications would remain "live".
>
> > - changing pipeInput to take in ProducerRecord, but not providing easy
> way
> > to contruct those like ConsumerRecordFactory
>
> I didn't follow this as well. The ConsumerRecordFactory is there
> because it's a pain to construct ConsumerRecords. Conversely,
> ProducerRecord has many convenience constructors, so we wouldn't need
> a factory at all. This is a net win for users, since there's less
> surface area for them to deal with. Under my proposal, we'd deprecate
> the whole ConsumerRecordFactory.
>
> Note that there's an "idea parity check" here: ConsumerRecords are
> hard to construct because developers aren't meant to ever construct
> them. They are meant to construct ProducerRecords, which is why it's
> made easy. TTD has inverted the relationships of these classes, which
> is why the ConsumerRecordFactory is necessary, but if we correct it,
> and return to a "normal" interaction with the Client library, then we
> don't need special support classes.
>
> > - if initializing ConsumerRecord to/from  ProducerRecord  in these
> classes
> > field by field contructor, there are risk new fields are not added to
> this
> > classes if there are changes in ProducerRecord or ConsumerRecord
>
> This risk seems pretty low, to be honest. We will have tests that
> exercise this testing framework, so if anyone changes ProducerRecord
> or ConsumerRecord, our tests will break. Since both libraries are
> build together, the problem would be fixed before the change is ever
> merged to trunk.
>
> > I would propose a separate KIP for these and probably other
> enhanchements:
> > -superclass or common interface for ConsumerRecord and ProducerRecord
> > -contructors to ConsumerRecord and ProducerRecord to initialize with this
> > superclass
> > -modify OutputVerifier to work with both ConsumerRecord and
> ProducerRecord
> > -create new RecordFactory to replace ConsumerRecordFactory
>
> I understand your motivation to control the scope of this change, but
> I actually think that it's better for user-facing design changes to
> occur in fewer, bigger, chunks, rather than many small changes. People
> will get fatigued if multiple releases in a row change the
> test-support library from under their feet. Better to do it in one
> shot.
>
> Plus, this is a design discussion. We need to include the whole scope
> of the system in the design, or we may realize in Phase 3 that there
> was some design error in Phase 1, since we were only designing
> locally. This doesn't mean that we only need one Jira ticket, there
> can be many in support of this KIP, or that we only need one PR, it's
> certainly better to send multiple small PRs to decrease risk and ease
> reviews. But the design discussion doesn't need to be fragmented
> similarly.
>
> > 3. return null vs NoSuchElementException when empty queue
> >
> > Should this be also included to the above TTD usability changes?
> > If single row read methods is changed to throw expectiong, it would
> require
> > addition of hasRecords to able to verified the empty queue scenarios.
> > I do not know how to implement it currently without modifying TTD to
> > provide some kind way to get the queue size or peak items.
>
> Yes, it's absolutely within bounds to propose changes to TTD to
> support the ergonomic API you're proposing.
>
> > 4. IllegalArgumentException("topic doesn't exist")
> > Is this worth separate ticket?
>
> This is your call. That was just an idea in response to your experience.
>
> > 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
> >
> > I was thinking org.apache.kafka.streams.test where also OutputVerifier
> and
> > ConsumerRecordFactory exist would be more logical place, but
> > I do not know is there some technical reasons why TTD are in
> > org.apache.kafka.streams, not in org.apache.kafka.streams.test where
> other
> > classes are.
> >
> > Did I skip something?
>
> Ah, no, you're right. I'm not sure why that is. I admit it's
> confusing. I don't think the package matters *that* much, just keep it
> wherever you think is appropriate.
>
>
> That's all! Thanks for entertaining my thoughts.
> -John
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by John Roesler <jo...@confluent.io>.
Hi Jukka, thanks for the reply!

I think this is a good summary (the discussion was getting a little
unwieldy. I'll reply inline.

Also, thanks for clarify about your library vs. this KIP. That makes
perfect sense to me.
>
> 1. Add JavaDoc for KIP
>
> Is there a good example of KIP where Javadoc is included, so I can follow?
> I create this KIP based on this as an example::
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
>
>
> Now added some comments to KIP page to clarify timestamp handling, but I
> did not want to add full JavaDoc of each methods.
> Is that enough?

That's fine. I was just trying to make the review process more
efficient for other reviewers (which makes getting your KIP accepted
more efficient). I reviewed a few recent KIPs, and, indeed, I see that
javadocs are not actually as common as I thought.

> 2. TTD usability changes and swapping ConsumerRecord and ProducerRecord in
> APIs
>
> To my point of view only:
> - changing readRecord to return ConsumerRecord would cause we cannot use
> OutputVerifier

Yes, we'd likely have to provide new methods in OutputVerifier to work
with ConsumerRecord. If you buy into the plan of deprecating most of
the current-style interactions, this wouldn't be that confusing, since
all the ProducerRecord verifications would be deprecated, and only the
ConsumerRecord verifications would remain "live".

> - changing pipeInput to take in ProducerRecord, but not providing easy way
> to contruct those like ConsumerRecordFactory

I didn't follow this as well. The ConsumerRecordFactory is there
because it's a pain to construct ConsumerRecords. Conversely,
ProducerRecord has many convenience constructors, so we wouldn't need
a factory at all. This is a net win for users, since there's less
surface area for them to deal with. Under my proposal, we'd deprecate
the whole ConsumerRecordFactory.

Note that there's an "idea parity check" here: ConsumerRecords are
hard to construct because developers aren't meant to ever construct
them. They are meant to construct ProducerRecords, which is why it's
made easy. TTD has inverted the relationships of these classes, which
is why the ConsumerRecordFactory is necessary, but if we correct it,
and return to a "normal" interaction with the Client library, then we
don't need special support classes.

> - if initializing ConsumerRecord to/from  ProducerRecord  in these classes
> field by field contructor, there are risk new fields are not added to this
> classes if there are changes in ProducerRecord or ConsumerRecord

This risk seems pretty low, to be honest. We will have tests that
exercise this testing framework, so if anyone changes ProducerRecord
or ConsumerRecord, our tests will break. Since both libraries are
build together, the problem would be fixed before the change is ever
merged to trunk.

> I would propose a separate KIP for these and probably other enhanchements:
> -superclass or common interface for ConsumerRecord and ProducerRecord
> -contructors to ConsumerRecord and ProducerRecord to initialize with this
> superclass
> -modify OutputVerifier to work with both ConsumerRecord and ProducerRecord
> -create new RecordFactory to replace ConsumerRecordFactory

I understand your motivation to control the scope of this change, but
I actually think that it's better for user-facing design changes to
occur in fewer, bigger, chunks, rather than many small changes. People
will get fatigued if multiple releases in a row change the
test-support library from under their feet. Better to do it in one
shot.

Plus, this is a design discussion. We need to include the whole scope
of the system in the design, or we may realize in Phase 3 that there
was some design error in Phase 1, since we were only designing
locally. This doesn't mean that we only need one Jira ticket, there
can be many in support of this KIP, or that we only need one PR, it's
certainly better to send multiple small PRs to decrease risk and ease
reviews. But the design discussion doesn't need to be fragmented
similarly.

> 3. return null vs NoSuchElementException when empty queue
>
> Should this be also included to the above TTD usability changes?
> If single row read methods is changed to throw expectiong, it would require
> addition of hasRecords to able to verified the empty queue scenarios.
> I do not know how to implement it currently without modifying TTD to
> provide some kind way to get the queue size or peak items.

Yes, it's absolutely within bounds to propose changes to TTD to
support the ergonomic API you're proposing.

> 4. IllegalArgumentException("topic doesn't exist")
> Is this worth separate ticket?

This is your call. That was just an idea in response to your experience.

> 5. org.apache.kafka.streams.test vs org.apache.kafka.streams
>
> I was thinking org.apache.kafka.streams.test where also OutputVerifier and
> ConsumerRecordFactory exist would be more logical place, but
> I do not know is there some technical reasons why TTD are in
> org.apache.kafka.streams, not in org.apache.kafka.streams.test where other
> classes are.
>
> Did I skip something?

Ah, no, you're right. I'm not sure why that is. I admit it's
confusing. I don't think the package matters *that* much, just keep it
wherever you think is appropriate.


That's all! Thanks for entertaining my thoughts.
-John

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Jukka Karvanen <ju...@jukinimi.com>.
Hi,

This KIP is relating to add classes to kafka-stream-test-utils.
My initial plan was to create new project for these helper classes before I
decided to contribute directly to Apache Kafka with this KIP.
I have one streams project where I have used these classes, that is why I
planned to release this as separate "early access" package
before the Kafka release including this is publicly available. This would
work at the same time "a historically-compatible version of the library".

I try to summarize the open topics after your last email:

1. Add JavaDoc for KIP

Is there a good example of KIP where Javadoc is included, so I can follow?
I create this KIP based on this as an example::
https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams


Now added some comments to KIP page to clarify timestamp handling, but I
did not want to add full JavaDoc of each methods.
Is that enough?

2. TTD usability changes and swapping ConsumerRecord and ProducerRecord in
APIs

To my point of view only:
- changing readRecord to return ConsumerRecord would cause we cannot use
OutputVerifier
- changing pipeInput to take in ProducerRecord, but not providing easy way
to contruct those like ConsumerRecordFactory
- if initializing ConsumerRecord to/from  ProducerRecord  in these classes
field by field contructor, there are risk new fields are not added to this
classes if there are changes in ProducerRecord or ConsumerRecord

I would propose a separate KIP for these and probably other enhanchements:
-superclass or common interface for ConsumerRecord and ProducerRecord
-contructors to ConsumerRecord and ProducerRecord to initialize with this
superclass
-modify OutputVerifier to work with both ConsumerRecord and ProducerRecord
-create new RecordFactory to replace ConsumerRecordFactory


3. return null vs NoSuchElementException when empty queue

Should this be also included to the above TTD usability changes?
If single row read methods is changed to throw expectiong, it would require
addition of hasRecords to able to verified the empty queue scenarios.
I do not know how to implement it currently without modifying TTD to
provide some kind way to get the queue size or peak items.

4. IllegalArgumentException("topic doesn't exist")
Is this worth separate ticket?

5. org.apache.kafka.streams.test vs org.apache.kafka.streams

I was thinking org.apache.kafka.streams.test where also OutputVerifier and
ConsumerRecordFactory exist would be more logical place, but
I do not know is there some technical reasons why TTD are in
org.apache.kafka.streams, not in org.apache.kafka.streams.test where other
classes are.

Did I skip something?

Jukka


ti 7. toukok. 2019 klo 22.02 John Roesler (john@confluent.io) kirjoitti:

> Thanks for the responses, Jukka!
>
> Thanks for the reference to the javadocs in your library, but I
> actually meant they should be part of the KIP document.
>
> As a general comment, I did get that your intent is to smooth over
> some rough spots with the current testing library, and that many of
> your API/behavior decisions are just reflecting the underlying TTD.
> What I'm wondering is whether we can take the opportunity to make a
> bigger change for an even smoother usability experience.
>
> Regarding the publishing artifact, did you mean that you're proposing
> to add a separate build artifact to the Apache Kafka project, or just
> that you plan to host a historically-compatible version of the library
> on your own? My initial reaction is that we should just include this
> stuff in the test-utils package starting at whatever version. This
> gives us greater latitude to modify the underlying layers in service
> of this KIP. Plus, it's nicer as a user to just have one test-support
> artifact to manage.
>
> Specific responses below:
> > >InputTopic:
> > >
> > >1. Have you considered adding a ProducerRecord input method to the input
> > >topic? This might be unnecessary, given the overloads you provide. I'm
> > >wondering if it decreases the domain mismatch between TopologyTestDriver
> > >and KafkaStreams, though, since in production code, you send input to
> the
> > >app as a ProducerRecord, via a topic. Also, this might let you drop
> some of
> > >the less mainstream overloads, like the ones for headers.
> >
> > Ok, so not same methods as in TopologyTestDriver:
> > pipeInput(ConsumerRecord<byte[], byte[]> consumerRecord);
> > pipeInput(List<ConsumerRecord<byte[], byte[]>> records);
> > but instead:
> > pipeInput(ProducerRecord <K, V]> record);
> > pipeInput(List< ProducerRecord  <K, V]>> records);
> > Which would convert ProductRecords to ConsumerRecords before piping to
> TTD
> >
> > and Drop these methods
> >     void pipeInput(K key, V value, Headers headers);
> >     void pipeInput(K key, V value, Headers headers, long timestampMs);
>
> Yes, this is essentially what I was saying.
>
> >
> > In this case you would not able to create those with
> ConsumerRecordFactory,
> > but needing to create  ProducerRecord   objects directly.one by one or
> with
> > some helper methods.
> >
> > Generally I was hoping that ProducerRecord and ConsumerRecord would have
> > inherited to same base class or implemented some kind KafkaRecord
> > interface, but
> > it is not the case now. So in the most of  case it would have enough to
> > pass KafkaRecord instead of ProducerRecord like in OutputVerifier now.
> >
> > What do you think?
> >
> I agree it would be nice in some cases to have a symmetric superclass
> to capture the common fields in Producer and Consumer records, but I
> don't know if I see how that's an impediment here.
>
> Part of what I was thinking below is that the interface you're
> proposing may actually render the ConsumerRecordFactory unnecessary as
> a public interface. Since we're taking a typed ProducerRecord<K,V>
>
> > >2. On the "pipeList" methods, it seems like the "start,advance"
> timestamp
> > >approach forces me to do a little mental math, if I actually want to
> > >achieve some specific timestamp per record, or to be able to verify the
> > >result, given specific timestamps as input. Did you consider a
> > >KeyValueTimestamp value type instead? Alternatively, if you like the
> > >ProducerRecord approach, above, you could lean on that instead.
> >
> > This start + advance concept is coming directly from
> ConsumerRecordFactory.
> > I don't see value of adding KeyValueTimestamp class.
> >
> > My own approach has been piping in the seeds with list and pipe special
> > times with:
> > void pipeInput(K key, V value, long timestampMs);
> >
> > This start + n*interval approach will fit the normal happy case scenarios
> > and
> > for timing relating test special cases I have used pipe single record at
> a
> > time.
> >
> > And that ProducerRecord approach is one alternative, but you end up
> > generating that list some special methods anyway,
> > which could also pipe those in one by one.
> >
>
> Ok, my intuition may not be the best here, because I do a *lot* of
> timestamp-sensitive testing. I think your experience is more typical.
> Plus, it sounds like you're no opposed to adding ProducerRecord<K,V>
> input, which renders KeyValueTimestamp totally unnecessary.
>
> I do understand that this "start,advance" API comes from
> ConsumerRecordFactory, I'm sorry if it seemed I was being critical of
> your proposal. I think that if I actually want to verify the right
> thing happens with timestamps, then the ProducerRecord input would let
> me do that, and for other cases, where I just need "some" timestamp,
> then start+advance works fine.
>
> > >
> > >3. I wasn't clear on the semantics of the constructors that take a start
> > >timestamp, but no advance time. I also wasn't clear on the semantics
> when
> > >the constructor specifies start/advance, but then we also call the input
> > >methods that specify timestamps, or start/advance timestamps. Also
> related,
> > >what's the "default" timestamp, if no start is specified, "zero" or
> "now"
> > >both seem reasonable. Similar with the advance, "1ms" and "0ms" both
> seem
> > >reasonable defaults.
> >
> > This is also based on the implementation of ConsumerRecordFactory, which
> > seems to
> > set the base time with System.currentTimeMillis() if not set,advanceMs
> is 0
> > and
> > you can overwrite these timestamp providing it with the method call.
> >
>
> I acknowledge that this is coming through from ConsumerRecordFactory,
> but I'm still wondering if it's the API we should be offering. This
> could be a good idea to correct an ambiguity that we didn't notice
> before.
>
> Also, note that even if the semantics are governed by the underlying
> library, they still need to be clear to someone looking just at this
> interface. Perhaps adding Javadocs to this KIP would help clear this
> up.
>
> > >
> > >OutputTopic:
> > >
> > >4. Tentatively, ProducerRecord seems like a strange output type, since
> as a
> > >user, I'm "consuming" the results. How about using ConsumerRecord
> instead?
> >
> > This is also directly from TTD. Also for this that KafkaRecord concept
> > would be better, but would
> > require alignment with existing classes, too.
>
> Despite the underlying library, do you think it would work to offer
> ConsumerRecord as an output type?
>
> > >
> > >5. We have methods above for producing with specific timestamps, but
> none
> > >for observing the timestamps. How can we strengthen the symmetry?
> >
> > My own experience with testing with TTD.
> > Those timestamp are important when feeding in the data, so can test for
> > example
> > timewindows and joins with left side or right side arriving first.
> >
> > For the result verification, I have not so far checked the Kafka
> timestamps.
> > Timewindow related information is available in the window key and the
> joins
> > the output need to be verified, not actual timestamp of it.
> > So for those case where you need it, those could be checked with
> > fetching ProducerRecord
> > one by one.
> >
> > What do you think?
>
> Ack. As I mentioned before, my experience is a little different than
> normal, so I think I trust your intuition more than my own.
>
> > >
> > >6. (just a comment) I like the readToMap method. Thanks!
> > >
> > Good.
> >
> > >7. I know it clashes somewhat with the Kafka semantics, but I'm a little
> > >concerned about null semantics in the output topic. In particular,
> > >"readValue()" returning null ambiguously indicates both "there is no
> value"
> > >and "there is a null value present". Can we consider throwing a
> > >NoSuchElementException when you attempt to read, but there is nothing
> > there?
> >
> > Yes, that readValue with null, is little confusing.
> > If you need to check value equal null, you need to fetch KeyValue with
> > readKeyValue instead.
> > Does it solve the same problem?
>
> I'm not sure. It feels like we're setting a trap for people. My spidey
> sense is still saying that a NoSuchElementException on all the
> read-single-value APIs is the best solution.
>
> >
> > Also what I noticed from the current implementation of TTD, that if you
> > pipe in to non-existing topic,
> > it generated Exception, but reading from non-existing topic return only
> > null.
> >
> > So getting null from readValue can mean you have mistyped the topic name,
> > there are no record to consumer or value of it is null.
>
> That is a bummer. I guess we could even add an
> IllegalArgumentException("topic doesn't exist") or something for this
> case.
>
> >
> > >
> > >7.5. This would necessitate adding a boolean method to query the
> presence
> > >of output records, which can be a handy way to cap off a test:
> > >`assertFalse(outputTopic.hasRecords(),
> > >outputTopic.readKeyValuesToList().toString())` would fail and print out
> the
> > >remaining records if there are any.
> >
> > Yes, I have thinking about the same, but so far I have used something
> like
> >
> > assertThat(outputTopic.readRecord(), nullValue());
> >
> > which will output the next record if it is failing.
>
> Yes, that does about the same thing, if we preserve the null-return
> behavior.
>
> >
> > >
> > >General:
> > >
> > >8. Can the TTD, input, and output topics implement AutoCloseable, to
> > >facilitate try-with-resources in tests?
> > >
> > >Example:
> > >try (driver = new TTD(), input = new TestInputTopic(), output = new
> > >TestOutputTopic() ) {
> > > ...
> > >} // ensures everything is closed
> > >
> > In the current implementation Topics does not have any state and do not
> > need the closing.
>
> That's fair.
>
> >
> > >7. Should we add some assertion that all the output is consumed when you
> > >call testDriver.close() ?
> >
> > Not by default at least. There are scenarios where you validate only
> > OutputTopic even the stream is generating others.
> > Maybe some kind of optionally called boolean method to check all topics
> in
> > TTD might be good.
>
> Yeah, you're right. In retrospect, this idea doesn't seem that useful.
>
> > >
> > >8. Should we consider deprecating (some or all) of the overlapping
> > >mechanisms in TopologyTestDriver? It seems like this might improve
> > >usability by reducing ambiguity.
> > The most of the methods do not overlap, only those
> Consumer/ProducerRecord
> > based ones, but
> > the problem with the current implementation is you can deprecate those
> > directly, because they are used by these classes.
> > So there would be need add some other non public methods to be used by
> > Topic classes in this case.
> > Also relating to it, these are now in org.apache.kafka.streams.test
> package
> > not in same package as TTD in org.apache.kafka.streams.
>
> Sorry, my wording was ambiguous, I meant deprecate
> ConsumerRecordFactory, along with any mechanism to produce or consumer
> data directly in/out of TopologyTestDriver. This would direct everyone
> to re-write their tests using this new topic-in/topic-out interface.
> Even if this is more work in the short term, people will generally
> benefit in the long run from not having to make a decision about which
> i/o API to use for every single test.
>
> The package is a good point. Why shouldn't we just put the new stuff
> in the same package as TTD?
>
> >
> > >
> > >9. Can you give us some idea of the javadoc for each of the new methods
> > >you're proposing? The documentation is also part of the public API, and
> it
> > >also helps us understand the semantics of the operations you're
> > proposing.
> > >
> >
> > My current version of JavaDoc can be found, here:
> > https://jukkakarvanen.github.io/kafka-streams-test-topics/
> >
> > Also I made proposal for Developer Guide Testing changes to
> > docs/streams/developer-guide/testing.html
> > <
> https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics#diff-3e6e61eaf18f0d35cdb94ed705581b55
> >
> > :
> >
> https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> >
> > (Not sure is there better way to check those, than in Github diff.)
> >
> > So, thanks for the feedback and I am happy to change those things you see
> > viable after my comments.
>
> Thanks! There's no need to include documentation changes in the KIP,
> but please include the proposed javadoc in the KIP design doc itself.
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by John Roesler <jo...@confluent.io>.
Thanks for the responses, Jukka!

Thanks for the reference to the javadocs in your library, but I
actually meant they should be part of the KIP document.

As a general comment, I did get that your intent is to smooth over
some rough spots with the current testing library, and that many of
your API/behavior decisions are just reflecting the underlying TTD.
What I'm wondering is whether we can take the opportunity to make a
bigger change for an even smoother usability experience.

Regarding the publishing artifact, did you mean that you're proposing
to add a separate build artifact to the Apache Kafka project, or just
that you plan to host a historically-compatible version of the library
on your own? My initial reaction is that we should just include this
stuff in the test-utils package starting at whatever version. This
gives us greater latitude to modify the underlying layers in service
of this KIP. Plus, it's nicer as a user to just have one test-support
artifact to manage.

Specific responses below:
> >InputTopic:
> >
> >1. Have you considered adding a ProducerRecord input method to the input
> >topic? This might be unnecessary, given the overloads you provide. I'm
> >wondering if it decreases the domain mismatch between TopologyTestDriver
> >and KafkaStreams, though, since in production code, you send input to the
> >app as a ProducerRecord, via a topic. Also, this might let you drop some of
> >the less mainstream overloads, like the ones for headers.
>
> Ok, so not same methods as in TopologyTestDriver:
> pipeInput(ConsumerRecord<byte[], byte[]> consumerRecord);
> pipeInput(List<ConsumerRecord<byte[], byte[]>> records);
> but instead:
> pipeInput(ProducerRecord <K, V]> record);
> pipeInput(List< ProducerRecord  <K, V]>> records);
> Which would convert ProductRecords to ConsumerRecords before piping to TTD
>
> and Drop these methods
>     void pipeInput(K key, V value, Headers headers);
>     void pipeInput(K key, V value, Headers headers, long timestampMs);

Yes, this is essentially what I was saying.

>
> In this case you would not able to create those with ConsumerRecordFactory,
> but needing to create  ProducerRecord   objects directly.one by one or with
> some helper methods.
>
> Generally I was hoping that ProducerRecord and ConsumerRecord would have
> inherited to same base class or implemented some kind KafkaRecord
> interface, but
> it is not the case now. So in the most of  case it would have enough to
> pass KafkaRecord instead of ProducerRecord like in OutputVerifier now.
>
> What do you think?
>
I agree it would be nice in some cases to have a symmetric superclass
to capture the common fields in Producer and Consumer records, but I
don't know if I see how that's an impediment here.

Part of what I was thinking below is that the interface you're
proposing may actually render the ConsumerRecordFactory unnecessary as
a public interface. Since we're taking a typed ProducerRecord<K,V>

> >2. On the "pipeList" methods, it seems like the "start,advance" timestamp
> >approach forces me to do a little mental math, if I actually want to
> >achieve some specific timestamp per record, or to be able to verify the
> >result, given specific timestamps as input. Did you consider a
> >KeyValueTimestamp value type instead? Alternatively, if you like the
> >ProducerRecord approach, above, you could lean on that instead.
>
> This start + advance concept is coming directly from ConsumerRecordFactory.
> I don't see value of adding KeyValueTimestamp class.
>
> My own approach has been piping in the seeds with list and pipe special
> times with:
> void pipeInput(K key, V value, long timestampMs);
>
> This start + n*interval approach will fit the normal happy case scenarios
> and
> for timing relating test special cases I have used pipe single record at a
> time.
>
> And that ProducerRecord approach is one alternative, but you end up
> generating that list some special methods anyway,
> which could also pipe those in one by one.
>

Ok, my intuition may not be the best here, because I do a *lot* of
timestamp-sensitive testing. I think your experience is more typical.
Plus, it sounds like you're no opposed to adding ProducerRecord<K,V>
input, which renders KeyValueTimestamp totally unnecessary.

I do understand that this "start,advance" API comes from
ConsumerRecordFactory, I'm sorry if it seemed I was being critical of
your proposal. I think that if I actually want to verify the right
thing happens with timestamps, then the ProducerRecord input would let
me do that, and for other cases, where I just need "some" timestamp,
then start+advance works fine.

> >
> >3. I wasn't clear on the semantics of the constructors that take a start
> >timestamp, but no advance time. I also wasn't clear on the semantics when
> >the constructor specifies start/advance, but then we also call the input
> >methods that specify timestamps, or start/advance timestamps. Also related,
> >what's the "default" timestamp, if no start is specified, "zero" or "now"
> >both seem reasonable. Similar with the advance, "1ms" and "0ms" both seem
> >reasonable defaults.
>
> This is also based on the implementation of ConsumerRecordFactory, which
> seems to
> set the base time with System.currentTimeMillis() if not set,advanceMs is 0
> and
> you can overwrite these timestamp providing it with the method call.
>

I acknowledge that this is coming through from ConsumerRecordFactory,
but I'm still wondering if it's the API we should be offering. This
could be a good idea to correct an ambiguity that we didn't notice
before.

Also, note that even if the semantics are governed by the underlying
library, they still need to be clear to someone looking just at this
interface. Perhaps adding Javadocs to this KIP would help clear this
up.

> >
> >OutputTopic:
> >
> >4. Tentatively, ProducerRecord seems like a strange output type, since as a
> >user, I'm "consuming" the results. How about using ConsumerRecord instead?
>
> This is also directly from TTD. Also for this that KafkaRecord concept
> would be better, but would
> require alignment with existing classes, too.

Despite the underlying library, do you think it would work to offer
ConsumerRecord as an output type?

> >
> >5. We have methods above for producing with specific timestamps, but none
> >for observing the timestamps. How can we strengthen the symmetry?
>
> My own experience with testing with TTD.
> Those timestamp are important when feeding in the data, so can test for
> example
> timewindows and joins with left side or right side arriving first.
>
> For the result verification, I have not so far checked the Kafka timestamps.
> Timewindow related information is available in the window key and the joins
> the output need to be verified, not actual timestamp of it.
> So for those case where you need it, those could be checked with
> fetching ProducerRecord
> one by one.
>
> What do you think?

Ack. As I mentioned before, my experience is a little different than
normal, so I think I trust your intuition more than my own.

> >
> >6. (just a comment) I like the readToMap method. Thanks!
> >
> Good.
>
> >7. I know it clashes somewhat with the Kafka semantics, but I'm a little
> >concerned about null semantics in the output topic. In particular,
> >"readValue()" returning null ambiguously indicates both "there is no value"
> >and "there is a null value present". Can we consider throwing a
> >NoSuchElementException when you attempt to read, but there is nothing
> there?
>
> Yes, that readValue with null, is little confusing.
> If you need to check value equal null, you need to fetch KeyValue with
> readKeyValue instead.
> Does it solve the same problem?

I'm not sure. It feels like we're setting a trap for people. My spidey
sense is still saying that a NoSuchElementException on all the
read-single-value APIs is the best solution.

>
> Also what I noticed from the current implementation of TTD, that if you
> pipe in to non-existing topic,
> it generated Exception, but reading from non-existing topic return only
> null.
>
> So getting null from readValue can mean you have mistyped the topic name,
> there are no record to consumer or value of it is null.

That is a bummer. I guess we could even add an
IllegalArgumentException("topic doesn't exist") or something for this
case.

>
> >
> >7.5. This would necessitate adding a boolean method to query the presence
> >of output records, which can be a handy way to cap off a test:
> >`assertFalse(outputTopic.hasRecords(),
> >outputTopic.readKeyValuesToList().toString())` would fail and print out the
> >remaining records if there are any.
>
> Yes, I have thinking about the same, but so far I have used something like
>
> assertThat(outputTopic.readRecord(), nullValue());
>
> which will output the next record if it is failing.

Yes, that does about the same thing, if we preserve the null-return behavior.

>
> >
> >General:
> >
> >8. Can the TTD, input, and output topics implement AutoCloseable, to
> >facilitate try-with-resources in tests?
> >
> >Example:
> >try (driver = new TTD(), input = new TestInputTopic(), output = new
> >TestOutputTopic() ) {
> > ...
> >} // ensures everything is closed
> >
> In the current implementation Topics does not have any state and do not
> need the closing.

That's fair.

>
> >7. Should we add some assertion that all the output is consumed when you
> >call testDriver.close() ?
>
> Not by default at least. There are scenarios where you validate only
> OutputTopic even the stream is generating others.
> Maybe some kind of optionally called boolean method to check all topics in
> TTD might be good.

Yeah, you're right. In retrospect, this idea doesn't seem that useful.

> >
> >8. Should we consider deprecating (some or all) of the overlapping
> >mechanisms in TopologyTestDriver? It seems like this might improve
> >usability by reducing ambiguity.
> The most of the methods do not overlap, only those Consumer/ProducerRecord
> based ones, but
> the problem with the current implementation is you can deprecate those
> directly, because they are used by these classes.
> So there would be need add some other non public methods to be used by
> Topic classes in this case.
> Also relating to it, these are now in org.apache.kafka.streams.test package
> not in same package as TTD in org.apache.kafka.streams.

Sorry, my wording was ambiguous, I meant deprecate
ConsumerRecordFactory, along with any mechanism to produce or consumer
data directly in/out of TopologyTestDriver. This would direct everyone
to re-write their tests using this new topic-in/topic-out interface.
Even if this is more work in the short term, people will generally
benefit in the long run from not having to make a decision about which
i/o API to use for every single test.

The package is a good point. Why shouldn't we just put the new stuff
in the same package as TTD?

>
> >
> >9. Can you give us some idea of the javadoc for each of the new methods
> >you're proposing? The documentation is also part of the public API, and it
> >also helps us understand the semantics of the operations you're
> proposing.
> >
>
> My current version of JavaDoc can be found, here:
> https://jukkakarvanen.github.io/kafka-streams-test-topics/
>
> Also I made proposal for Developer Guide Testing changes to
> docs/streams/developer-guide/testing.html
> <https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics#diff-3e6e61eaf18f0d35cdb94ed705581b55>
> :
> https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
>
> (Not sure is there better way to check those, than in Github diff.)
>
> So, thanks for the feedback and I am happy to change those things you see
> viable after my comments.

Thanks! There's no need to include documentation changes in the KIP,
but please include the proposed javadoc in the KIP design doc itself.

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Jukka Karvanen <ju...@jukinimi.com>.
Thanks John for feedback.

General comment first, which affect many of the individual answers.
My current approach has been abstracting the current functionality of
TopologyTestDriver and ConsumerRecordFactory
and that way the Contructors and pipe-methods signatures of TestInputTopic are
replicated from ConsumerRecordFactory.
Main addition are:
-constructor versions with serde instead of serializer / deserializer
-TestInputTopic possibility to feed in Value list instead of only
KeyValueList
-TestOutputTopic methods to unwrap  ProducerRecord and return collections
of unwrapped key values and values

The JavaDoc of current implementation can be found:
https://jukkakarvanen.github.io/kafka-streams-test-topics/
The classes and methods are identical even the package name is different.
Also these JavaDoc comments are replicating a lot of descriptions from
existing class methods JavaDocs..

I was planning to publish these classes as separate artifact so these can
be used also with older Kafka versions.
In my test with Kafka version 1.1, only the method including Headers
failed, all other functionality was working without modifications.


>InputTopic:
>
>1. Have you considered adding a ProducerRecord input method to the input
>topic? This might be unnecessary, given the overloads you provide. I'm
>wondering if it decreases the domain mismatch between TopologyTestDriver
>and KafkaStreams, though, since in production code, you send input to the
>app as a ProducerRecord, via a topic. Also, this might let you drop some of
>the less mainstream overloads, like the ones for headers.

Ok, so not same methods as in TopologyTestDriver:
pipeInput(ConsumerRecord<byte[], byte[]> consumerRecord);
pipeInput(List<ConsumerRecord<byte[], byte[]>> records);
but instead:
pipeInput(ProducerRecord <K, V]> record);
pipeInput(List< ProducerRecord  <K, V]>> records);
Which would convert ProductRecords to ConsumerRecords before piping to TTD

and Drop these methods
    void pipeInput(K key, V value, Headers headers);
    void pipeInput(K key, V value, Headers headers, long timestampMs);

In this case you would not able to create those with ConsumerRecordFactory,
but needing to create  ProducerRecord   objects directly.one by one or with
some helper methods.

Generally I was hoping that ProducerRecord and ConsumerRecord would have
inherited to same base class or implemented some kind KafkaRecord
interface, but
it is not the case now. So in the most of  case it would have enough to
pass KafkaRecord instead of ProducerRecord like in OutputVerifier now.

What do you think?

>2. On the "pipeList" methods, it seems like the "start,advance" timestamp
>approach forces me to do a little mental math, if I actually want to
>achieve some specific timestamp per record, or to be able to verify the
>result, given specific timestamps as input. Did you consider a
>KeyValueTimestamp value type instead? Alternatively, if you like the
>ProducerRecord approach, above, you could lean on that instead.

This start + advance concept is coming directly from ConsumerRecordFactory.
I don't see value of adding KeyValueTimestamp class.

My own approach has been piping in the seeds with list and pipe special
times with:
void pipeInput(K key, V value, long timestampMs);

This start + n*interval approach will fit the normal happy case scenarios
and
for timing relating test special cases I have used pipe single record at a
time.

And that ProducerRecord approach is one alternative, but you end up
generating that list some special methods anyway,
which could also pipe those in one by one.

>
>3. I wasn't clear on the semantics of the constructors that take a start
>timestamp, but no advance time. I also wasn't clear on the semantics when
>the constructor specifies start/advance, but then we also call the input
>methods that specify timestamps, or start/advance timestamps. Also related,
>what's the "default" timestamp, if no start is specified, "zero" or "now"
>both seem reasonable. Similar with the advance, "1ms" and "0ms" both seem
>reasonable defaults.

This is also based on the implementation of ConsumerRecordFactory, which
seems to
set the base time with System.currentTimeMillis() if not set,advanceMs is 0
and
you can overwrite these timestamp providing it with the method call.

>
>OutputTopic:
>
>4. Tentatively, ProducerRecord seems like a strange output type, since as a
>user, I'm "consuming" the results. How about using ConsumerRecord instead?

This is also directly from TTD. Also for this that KafkaRecord concept
would be better, but would
require alignment with existing classes, too.

>
>5. We have methods above for producing with specific timestamps, but none
>for observing the timestamps. How can we strengthen the symmetry?

My own experience with testing with TTD.
Those timestamp are important when feeding in the data, so can test for
example
timewindows and joins with left side or right side arriving first.

For the result verification, I have not so far checked the Kafka timestamps.
Timewindow related information is available in the window key and the joins
the output need to be verified, not actual timestamp of it.
So for those case where you need it, those could be checked with
fetching ProducerRecord
one by one.

What do you think?

>
>6. (just a comment) I like the readToMap method. Thanks!
>
Good.

>7. I know it clashes somewhat with the Kafka semantics, but I'm a little
>concerned about null semantics in the output topic. In particular,
>"readValue()" returning null ambiguously indicates both "there is no value"
>and "there is a null value present". Can we consider throwing a
>NoSuchElementException when you attempt to read, but there is nothing
there?

Yes, that readValue with null, is little confusing.
If you need to check value equal null, you need to fetch KeyValue with
readKeyValue instead.
Does it solve the same problem?

Also what I noticed from the current implementation of TTD, that if you
pipe in to non-existing topic,
it generated Exception, but reading from non-existing topic return only
null.

So getting null from readValue can mean you have mistyped the topic name,
there are no record to consumer or value of it is null.

>
>7.5. This would necessitate adding a boolean method to query the presence
>of output records, which can be a handy way to cap off a test:
>`assertFalse(outputTopic.hasRecords(),
>outputTopic.readKeyValuesToList().toString())` would fail and print out the
>remaining records if there are any.

Yes, I have thinking about the same, but so far I have used something like

assertThat(outputTopic.readRecord(), nullValue());

which will output the next record if it is failing.

>
>General:
>
>8. Can the TTD, input, and output topics implement AutoCloseable, to
>facilitate try-with-resources in tests?
>
>Example:
>try (driver = new TTD(), input = new TestInputTopic(), output = new
>TestOutputTopic() ) {
> ...
>} // ensures everything is closed
>
In the current implementation Topics does not have any state and do not
need the closing.


>7. Should we add some assertion that all the output is consumed when you
>call testDriver.close() ?

Not by default at least. There are scenarios where you validate only
OutputTopic even the stream is generating others.
Maybe some kind of optionally called boolean method to check all topics in
TTD might be good.
>
>8. Should we consider deprecating (some or all) of the overlapping
>mechanisms in TopologyTestDriver? It seems like this might improve
>usability by reducing ambiguity.
The most of the methods do not overlap, only those Consumer/ProducerRecord
based ones, but
the problem with the current implementation is you can deprecate those
directly, because they are used by these classes.
So there would be need add some other non public methods to be used by
Topic classes in this case.
Also relating to it, these are now in org.apache.kafka.streams.test package
not in same package as TTD in org.apache.kafka.streams.

>
>9. Can you give us some idea of the javadoc for each of the new methods
>you're proposing? The documentation is also part of the public API, and it
>also helps us understand the semantics of the operations you're
proposing.
>

My current version of JavaDoc can be found, here:
https://jukkakarvanen.github.io/kafka-streams-test-topics/

Also I made proposal for Developer Guide Testing changes to
docs/streams/developer-guide/testing.html
<https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics#diff-3e6e61eaf18f0d35cdb94ed705581b55>
:
https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics

(Not sure is there better way to check those, than in Github diff.)

So, thanks for the feedback and I am happy to change those things you see
viable after my comments.

Jukka

ma 6. toukok. 2019 klo 19.41 John Roesler (john@confluent.io) kirjoitti:

> Thanks for the KIP, Jukka! Sorry it took so long for me to review it.
>
> Just a few questions, in no particular order:
>
> InputTopic:
>
> 1. Have you considered adding a ProducerRecord input method to the input
> topic? This might be unnecessary, given the overloads you provide. I'm
> wondering if it decreases the domain mismatch between TopologyTestDriver
> and KafkaStreams, though, since in production code, you send input to the
> app as a ProducerRecord, via a topic. Also, this might let you drop some of
> the less mainstream overloads, like the ones for headers.
>
> 2. On the "pipeList" methods, it seems like the "start,advance" timestamp
> approach forces me to do a little mental math, if I actually want to
> achieve some specific timestamp per record, or to be able to verify the
> result, given specific timestamps as input. Did you consider a
> KeyValueTimestamp value type instead? Alternatively, if you like the
> ProducerRecord approach, above, you could lean on that instead.
>
> 3. I wasn't clear on the semantics of the constructors that take a start
> timestamp, but no advance time. I also wasn't clear on the semantics when
> the constructor specifies start/advance, but then we also call the input
> methods that specify timestamps, or start/advance timestamps. Also related,
> what's the "default" timestamp, if no start is specified, "zero" or "now"
> both seem reasonable. Similar with the advance, "1ms" and "0ms" both seem
> reasonable defaults.
>
> OutputTopic:
>
> 4. Tentatively, ProducerRecord seems like a strange output type, since as a
> user, I'm "consuming" the results. How about using ConsumerRecord instead?
>
> 5. We have methods above for producing with specific timestamps, but none
> for observing the timestamps. How can we strengthen the symmetry?
>
> 6. (just a comment) I like the readToMap method. Thanks!
>
> 7. I know it clashes somewhat with the Kafka semantics, but I'm a little
> concerned about null semantics in the output topic. In particular,
> "readValue()" returning null ambiguously indicates both "there is no value"
> and "there is a null value present". Can we consider throwing a
> NoSuchElementException when you attempt to read, but there is nothing
> there?
>
> 7.5. This would necessitate adding a boolean method to query the presence
> of output records, which can be a handy way to cap off a test:
> `assertFalse(outputTopic.hasRecords(),
> outputTopic.readKeyValuesToList().toString())` would fail and print out the
> remaining records if there are any.
>
> General:
>
> 8. Can the TTD, input, and output topics implement AutoCloseable, to
> facilitate try-with-resources in tests?
>
> Example:
> try (driver = new TTD(), input = new TestInputTopic(), output = new
> TestOutputTopic() ) {
>  ...
> } // ensures everything is closed
>
> 7. Should we add some assertion that all the output is consumed when you
> call testDriver.close() ?
>
> 8. Should we consider deprecating (some or all) of the overlapping
> mechanisms in TopologyTestDriver? It seems like this might improve
> usability by reducing ambiguity.
>
> 9. Can you give us some idea of the javadoc for each of the new methods
> you're proposing? The documentation is also part of the public API, and it
> also helps us understand the semantics of the operations you're proposing.
>
> That's all! Thanks so much for this awesome proposal,
> -John
>
> On Mon, May 6, 2019 at 6:58 AM Jukka Karvanen <jukka.karvanen@jukinimi.com
> >
> wrote:
>
> > Hi,
> >
> > Now everything what I planned to add including tests, samples and
> document
> > changes are in my branch:
> >
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> >
> >
> > So I can create PR as soon as this KIP is getting green light to proceed.
> >
> > Jukka
> >
> > la 4. toukok. 2019 klo 9.05 Jukka Karvanen (jukka.karvanen@jukinimi.com)
> > kirjoitti:
> >
> > > Hi,
> > >
> > > New TestInputTopic and TestOutputTopic included to Developer guide
> > testing
> > > page as alternative,
> > > The old way with ConsumerRecordFactory and OutputVerifier is not
> removed.
> > >
> > > You can see the proposal here in my branch:
> > >
> > >
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> > >
> > >
> > > I can create Work In progress pull request if that make commenting
> > > proposal easier.
> > > Still planning to add full coverage unit test and sample
> > WordCountDemoTest to
> > >
> >
> streams/examples/src/test/java/org/apache/kafka/streams/examples/wordcount,
> > > if this KIP is accepted.
> > >
> > > Jukka
> > >
> > >
> > > ti 30. huhtik. 2019 klo 13.59 Matthias J. Sax (matthias@confluent.io)
> > > kirjoitti:
> > >
> > >> KIP-451 was discarded in favor this this KIP. So it seems we are all
> on
> > >> the same page.
> > >>
> > >>
> > >> >> Relating to KIP-448.
> > >> >> What kind of alignment did you think about?
> > >>
> > >> Nothing in particular. Was more or less a random though. Maybe there
> is
> > >> nothing to be aligned. Just wanted to bring it up. :)
> > >>
> > >>
> > >> >> Some thoughts after reading also the comments in KAFKA-6460:
> > >> >> To my understand these KIP-448 mock classes need to be fed somehow
> > into
> > >> >> TopologyTestDriver.
> > >> >> I don't know how those KIP-448 mock are planned to be set to
> > >> >> TopologyTestDriver.
> > >>
> > >> KIP-448 is still quite early, and it's a little unclear... Maybe we
> > >> should just ignore it for now. Sorry for the distraction with my
> comment
> > >> about it.
> > >>
> > >>
> > >> Please give me some more time to review this KIP in detail and I will
> > >> follow up later again.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 4/26/19 2:25 PM, Jukka Karvanen wrote:
> > >> > Yes, my understanding was also that this KIP cover all the
> requirement
> > >> of
> > >> > KIP-451.
> > >> >
> > >> > Relating to KIP-448.
> > >> > What kind of alignment did you think about?
> > >> >
> > >> > Some thoughts after reading also the comments in KAFKA-6460:
> > >> > To my understand these KIP-448 mock classes need to be fed somehow
> > into
> > >> > TopologyTestDriver.
> > >> > I don't know how those KIP-448 mock are planned to be set to
> > >> > TopologyTestDriver.
> > >> >
> > >> > On contrast KIP-456 was planned to be on top of unmodified
> > >> > TopologyTestDriver and now driver is set to TestInputTopic and
> > >> > TestOutputTopic in constructor.
> > >> > There are also alternative that these Topic object could be get from
> > >> > TopologyTestDriver, but it would require the duplicating the
> > >> constructors
> > >> > of Topics as methods to
> > >> > TopologyTestDriver.
> > >> >
> > >> > Also related to those Store object when going through the methods in
> > >> > TopologyTestDriver I noticed accessing the state stores could be be
> > the
> > >> > third candidate for helper class or a group of classes.
> > >> > So addition to have TestInputTopic and TestOutputTopic, it could be
> > also
> > >> > TestKeyValueStore, TestWindowStore, ... to follow the logic in this
> > >> > KPI-456, but
> > >> > it does add not any functionality on top of .already existing
> > >> functionality
> > >> > *Store classes. So that's why I did not include those.
> > >> >
> > >> > Jukka
> > >> > -
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Not
> > >> >
> > >> > pe 26. huhtik. 2019 klo 12.03 Matthias J. Sax (
> matthias@confluent.io)
> > >> > kirjoitti:
> > >> >
> > >> >> Btw: there is also KIP-448. I was wondering if it might be required
> > or
> > >> >> helpful to align the design of both with each other. Thoughts?
> > >> >>
> > >> >>
> > >> >>
> > >> >> On 4/25/19 11:22 PM, Matthias J. Sax wrote:
> > >> >>> Thanks for the KIP!
> > >> >>>
> > >> >>> I was just comparing this KIP with KIP-451 (even if I did not yet
> > >> make a
> > >> >>> sorrow read over this KIP), and I agree that there is a big
> overlap.
> > >> It
> > >> >>> seems that KIP-456 might subsume KIP-451.
> > >> >>>
> > >> >>> Let's wait on Patrick's input to see how to proceed.
> > >> >>>
> > >> >>>
> > >> >>> -Matthias
> > >> >>>
> > >> >>> On 4/25/19 12:03 AM, Jukka Karvanen wrote:
> > >> >>>> Hi,
> > >> >>>>
> > >> >>>> If you want to see or test the my current idea of the
> > implementation
> > >> of
> > >> >>>> this KIP, you can check it out in my repo:
> > >> >>>>
> > >> >>
> > >>
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> > >> >>>>
> > >> >>>>
> > >> >>>> After my test with KPI-451  I do not see need for add methods for
> > >> >>>> Iterables, but waiting Patrick's clarification of the use case.
> > >> >>>>
> > >> >>>> Jukka
> > >> >>>>
> > >> >>>>
> > >> >>>> ti 23. huhtik. 2019 klo 15.37 Jukka Karvanen (
> > >> >> jukka.karvanen@jukinimi.com)
> > >> >>>> kirjoitti:
> > >> >>>>
> > >> >>>>> Hi All,
> > >> >>>>>
> > >> >>>>> I would like to start the discussion on KIP-456: Helper classes
> to
> > >> >> make it
> > >> >>>>> simpler to write test logic with TopologyTestDriver:
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> There is also related KIP adding methods to TopologyTestDriver:
> > >> >>>>>
> > >> >>>>>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I added those new Iterable based methods to this TestOutputTopic
> > >> even
> > >> >> not
> > >> >>>>> tested those myself yet.
> > >> >>>>> So this version contains both my original List and Map based
> > methods
> > >> >> and
> > >> >>>>> these new one.
> > >> >>>>> Based on the discussion some of these can be dropped, if those
> are
> > >> >> seen as
> > >> >>>>> redundant.
> > >> >>>>>
> > >> >>>>> Best Regards,
> > >> >>>>> Jukka
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >>
> >
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by John Roesler <jo...@confluent.io>.
Thanks for the KIP, Jukka! Sorry it took so long for me to review it.

Just a few questions, in no particular order:

InputTopic:

1. Have you considered adding a ProducerRecord input method to the input
topic? This might be unnecessary, given the overloads you provide. I'm
wondering if it decreases the domain mismatch between TopologyTestDriver
and KafkaStreams, though, since in production code, you send input to the
app as a ProducerRecord, via a topic. Also, this might let you drop some of
the less mainstream overloads, like the ones for headers.

2. On the "pipeList" methods, it seems like the "start,advance" timestamp
approach forces me to do a little mental math, if I actually want to
achieve some specific timestamp per record, or to be able to verify the
result, given specific timestamps as input. Did you consider a
KeyValueTimestamp value type instead? Alternatively, if you like the
ProducerRecord approach, above, you could lean on that instead.

3. I wasn't clear on the semantics of the constructors that take a start
timestamp, but no advance time. I also wasn't clear on the semantics when
the constructor specifies start/advance, but then we also call the input
methods that specify timestamps, or start/advance timestamps. Also related,
what's the "default" timestamp, if no start is specified, "zero" or "now"
both seem reasonable. Similar with the advance, "1ms" and "0ms" both seem
reasonable defaults.

OutputTopic:

4. Tentatively, ProducerRecord seems like a strange output type, since as a
user, I'm "consuming" the results. How about using ConsumerRecord instead?

5. We have methods above for producing with specific timestamps, but none
for observing the timestamps. How can we strengthen the symmetry?

6. (just a comment) I like the readToMap method. Thanks!

7. I know it clashes somewhat with the Kafka semantics, but I'm a little
concerned about null semantics in the output topic. In particular,
"readValue()" returning null ambiguously indicates both "there is no value"
and "there is a null value present". Can we consider throwing a
NoSuchElementException when you attempt to read, but there is nothing there?

7.5. This would necessitate adding a boolean method to query the presence
of output records, which can be a handy way to cap off a test:
`assertFalse(outputTopic.hasRecords(),
outputTopic.readKeyValuesToList().toString())` would fail and print out the
remaining records if there are any.

General:

8. Can the TTD, input, and output topics implement AutoCloseable, to
facilitate try-with-resources in tests?

Example:
try (driver = new TTD(), input = new TestInputTopic(), output = new
TestOutputTopic() ) {
 ...
} // ensures everything is closed

7. Should we add some assertion that all the output is consumed when you
call testDriver.close() ?

8. Should we consider deprecating (some or all) of the overlapping
mechanisms in TopologyTestDriver? It seems like this might improve
usability by reducing ambiguity.

9. Can you give us some idea of the javadoc for each of the new methods
you're proposing? The documentation is also part of the public API, and it
also helps us understand the semantics of the operations you're proposing.

That's all! Thanks so much for this awesome proposal,
-John

On Mon, May 6, 2019 at 6:58 AM Jukka Karvanen <ju...@jukinimi.com>
wrote:

> Hi,
>
> Now everything what I planned to add including tests, samples and document
> changes are in my branch:
>
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
>
>
> So I can create PR as soon as this KIP is getting green light to proceed.
>
> Jukka
>
> la 4. toukok. 2019 klo 9.05 Jukka Karvanen (jukka.karvanen@jukinimi.com)
> kirjoitti:
>
> > Hi,
> >
> > New TestInputTopic and TestOutputTopic included to Developer guide
> testing
> > page as alternative,
> > The old way with ConsumerRecordFactory and OutputVerifier is not removed.
> >
> > You can see the proposal here in my branch:
> >
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> >
> >
> > I can create Work In progress pull request if that make commenting
> > proposal easier.
> > Still planning to add full coverage unit test and sample
> WordCountDemoTest to
> >
> streams/examples/src/test/java/org/apache/kafka/streams/examples/wordcount,
> > if this KIP is accepted.
> >
> > Jukka
> >
> >
> > ti 30. huhtik. 2019 klo 13.59 Matthias J. Sax (matthias@confluent.io)
> > kirjoitti:
> >
> >> KIP-451 was discarded in favor this this KIP. So it seems we are all on
> >> the same page.
> >>
> >>
> >> >> Relating to KIP-448.
> >> >> What kind of alignment did you think about?
> >>
> >> Nothing in particular. Was more or less a random though. Maybe there is
> >> nothing to be aligned. Just wanted to bring it up. :)
> >>
> >>
> >> >> Some thoughts after reading also the comments in KAFKA-6460:
> >> >> To my understand these KIP-448 mock classes need to be fed somehow
> into
> >> >> TopologyTestDriver.
> >> >> I don't know how those KIP-448 mock are planned to be set to
> >> >> TopologyTestDriver.
> >>
> >> KIP-448 is still quite early, and it's a little unclear... Maybe we
> >> should just ignore it for now. Sorry for the distraction with my comment
> >> about it.
> >>
> >>
> >> Please give me some more time to review this KIP in detail and I will
> >> follow up later again.
> >>
> >>
> >> -Matthias
> >>
> >> On 4/26/19 2:25 PM, Jukka Karvanen wrote:
> >> > Yes, my understanding was also that this KIP cover all the requirement
> >> of
> >> > KIP-451.
> >> >
> >> > Relating to KIP-448.
> >> > What kind of alignment did you think about?
> >> >
> >> > Some thoughts after reading also the comments in KAFKA-6460:
> >> > To my understand these KIP-448 mock classes need to be fed somehow
> into
> >> > TopologyTestDriver.
> >> > I don't know how those KIP-448 mock are planned to be set to
> >> > TopologyTestDriver.
> >> >
> >> > On contrast KIP-456 was planned to be on top of unmodified
> >> > TopologyTestDriver and now driver is set to TestInputTopic and
> >> > TestOutputTopic in constructor.
> >> > There are also alternative that these Topic object could be get from
> >> > TopologyTestDriver, but it would require the duplicating the
> >> constructors
> >> > of Topics as methods to
> >> > TopologyTestDriver.
> >> >
> >> > Also related to those Store object when going through the methods in
> >> > TopologyTestDriver I noticed accessing the state stores could be be
> the
> >> > third candidate for helper class or a group of classes.
> >> > So addition to have TestInputTopic and TestOutputTopic, it could be
> also
> >> > TestKeyValueStore, TestWindowStore, ... to follow the logic in this
> >> > KPI-456, but
> >> > it does add not any functionality on top of .already existing
> >> functionality
> >> > *Store classes. So that's why I did not include those.
> >> >
> >> > Jukka
> >> > -
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Not
> >> >
> >> > pe 26. huhtik. 2019 klo 12.03 Matthias J. Sax (matthias@confluent.io)
> >> > kirjoitti:
> >> >
> >> >> Btw: there is also KIP-448. I was wondering if it might be required
> or
> >> >> helpful to align the design of both with each other. Thoughts?
> >> >>
> >> >>
> >> >>
> >> >> On 4/25/19 11:22 PM, Matthias J. Sax wrote:
> >> >>> Thanks for the KIP!
> >> >>>
> >> >>> I was just comparing this KIP with KIP-451 (even if I did not yet
> >> make a
> >> >>> sorrow read over this KIP), and I agree that there is a big overlap.
> >> It
> >> >>> seems that KIP-456 might subsume KIP-451.
> >> >>>
> >> >>> Let's wait on Patrick's input to see how to proceed.
> >> >>>
> >> >>>
> >> >>> -Matthias
> >> >>>
> >> >>> On 4/25/19 12:03 AM, Jukka Karvanen wrote:
> >> >>>> Hi,
> >> >>>>
> >> >>>> If you want to see or test the my current idea of the
> implementation
> >> of
> >> >>>> this KIP, you can check it out in my repo:
> >> >>>>
> >> >>
> >>
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> >> >>>>
> >> >>>>
> >> >>>> After my test with KPI-451  I do not see need for add methods for
> >> >>>> Iterables, but waiting Patrick's clarification of the use case.
> >> >>>>
> >> >>>> Jukka
> >> >>>>
> >> >>>>
> >> >>>> ti 23. huhtik. 2019 klo 15.37 Jukka Karvanen (
> >> >> jukka.karvanen@jukinimi.com)
> >> >>>> kirjoitti:
> >> >>>>
> >> >>>>> Hi All,
> >> >>>>>
> >> >>>>> I would like to start the discussion on KIP-456: Helper classes to
> >> >> make it
> >> >>>>> simpler to write test logic with TopologyTestDriver:
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
> >> >>>>>
> >> >>>>>
> >> >>>>> There is also related KIP adding methods to TopologyTestDriver:
> >> >>>>>
> >> >>>>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
> >> >>>>>
> >> >>>>>
> >> >>>>> I added those new Iterable based methods to this TestOutputTopic
> >> even
> >> >> not
> >> >>>>> tested those myself yet.
> >> >>>>> So this version contains both my original List and Map based
> methods
> >> >> and
> >> >>>>> these new one.
> >> >>>>> Based on the discussion some of these can be dropped, if those are
> >> >> seen as
> >> >>>>> redundant.
> >> >>>>>
> >> >>>>> Best Regards,
> >> >>>>> Jukka
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >>
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Posted by Jukka Karvanen <ju...@jukinimi.com>.
Hi,

Now everything what I planned to add including tests, samples and document
changes are in my branch:
https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics


So I can create PR as soon as this KIP is getting green light to proceed.

Jukka

la 4. toukok. 2019 klo 9.05 Jukka Karvanen (jukka.karvanen@jukinimi.com)
kirjoitti:

> Hi,
>
> New TestInputTopic and TestOutputTopic included to Developer guide testing
> page as alternative,
> The old way with ConsumerRecordFactory and OutputVerifier is not removed.
>
> You can see the proposal here in my branch:
>
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
>
>
> I can create Work In progress pull request if that make commenting
> proposal easier.
> Still planning to add full coverage unit test and sample WordCountDemoTest to
> streams/examples/src/test/java/org/apache/kafka/streams/examples/wordcount,
> if this KIP is accepted.
>
> Jukka
>
>
> ti 30. huhtik. 2019 klo 13.59 Matthias J. Sax (matthias@confluent.io)
> kirjoitti:
>
>> KIP-451 was discarded in favor this this KIP. So it seems we are all on
>> the same page.
>>
>>
>> >> Relating to KIP-448.
>> >> What kind of alignment did you think about?
>>
>> Nothing in particular. Was more or less a random though. Maybe there is
>> nothing to be aligned. Just wanted to bring it up. :)
>>
>>
>> >> Some thoughts after reading also the comments in KAFKA-6460:
>> >> To my understand these KIP-448 mock classes need to be fed somehow into
>> >> TopologyTestDriver.
>> >> I don't know how those KIP-448 mock are planned to be set to
>> >> TopologyTestDriver.
>>
>> KIP-448 is still quite early, and it's a little unclear... Maybe we
>> should just ignore it for now. Sorry for the distraction with my comment
>> about it.
>>
>>
>> Please give me some more time to review this KIP in detail and I will
>> follow up later again.
>>
>>
>> -Matthias
>>
>> On 4/26/19 2:25 PM, Jukka Karvanen wrote:
>> > Yes, my understanding was also that this KIP cover all the requirement
>> of
>> > KIP-451.
>> >
>> > Relating to KIP-448.
>> > What kind of alignment did you think about?
>> >
>> > Some thoughts after reading also the comments in KAFKA-6460:
>> > To my understand these KIP-448 mock classes need to be fed somehow into
>> > TopologyTestDriver.
>> > I don't know how those KIP-448 mock are planned to be set to
>> > TopologyTestDriver.
>> >
>> > On contrast KIP-456 was planned to be on top of unmodified
>> > TopologyTestDriver and now driver is set to TestInputTopic and
>> > TestOutputTopic in constructor.
>> > There are also alternative that these Topic object could be get from
>> > TopologyTestDriver, but it would require the duplicating the
>> constructors
>> > of Topics as methods to
>> > TopologyTestDriver.
>> >
>> > Also related to those Store object when going through the methods in
>> > TopologyTestDriver I noticed accessing the state stores could be be the
>> > third candidate for helper class or a group of classes.
>> > So addition to have TestInputTopic and TestOutputTopic, it could be also
>> > TestKeyValueStore, TestWindowStore, ... to follow the logic in this
>> > KPI-456, but
>> > it does add not any functionality on top of .already existing
>> functionality
>> > *Store classes. So that's why I did not include those.
>> >
>> > Jukka
>> > -
>> >
>> >
>> >
>> >
>> >
>> > Not
>> >
>> > pe 26. huhtik. 2019 klo 12.03 Matthias J. Sax (matthias@confluent.io)
>> > kirjoitti:
>> >
>> >> Btw: there is also KIP-448. I was wondering if it might be required or
>> >> helpful to align the design of both with each other. Thoughts?
>> >>
>> >>
>> >>
>> >> On 4/25/19 11:22 PM, Matthias J. Sax wrote:
>> >>> Thanks for the KIP!
>> >>>
>> >>> I was just comparing this KIP with KIP-451 (even if I did not yet
>> make a
>> >>> sorrow read over this KIP), and I agree that there is a big overlap.
>> It
>> >>> seems that KIP-456 might subsume KIP-451.
>> >>>
>> >>> Let's wait on Patrick's input to see how to proceed.
>> >>>
>> >>>
>> >>> -Matthias
>> >>>
>> >>> On 4/25/19 12:03 AM, Jukka Karvanen wrote:
>> >>>> Hi,
>> >>>>
>> >>>> If you want to see or test the my current idea of the implementation
>> of
>> >>>> this KIP, you can check it out in my repo:
>> >>>>
>> >>
>> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
>> >>>>
>> >>>>
>> >>>> After my test with KPI-451  I do not see need for add methods for
>> >>>> Iterables, but waiting Patrick's clarification of the use case.
>> >>>>
>> >>>> Jukka
>> >>>>
>> >>>>
>> >>>> ti 23. huhtik. 2019 klo 15.37 Jukka Karvanen (
>> >> jukka.karvanen@jukinimi.com)
>> >>>> kirjoitti:
>> >>>>
>> >>>>> Hi All,
>> >>>>>
>> >>>>> I would like to start the discussion on KIP-456: Helper classes to
>> >> make it
>> >>>>> simpler to write test logic with TopologyTestDriver:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
>> >>>>>
>> >>>>>
>> >>>>> There is also related KIP adding methods to TopologyTestDriver:
>> >>>>>
>> >>>>>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
>> >>>>>
>> >>>>>
>> >>>>> I added those new Iterable based methods to this TestOutputTopic
>> even
>> >> not
>> >>>>> tested those myself yet.
>> >>>>> So this version contains both my original List and Map based methods
>> >> and
>> >>>>> these new one.
>> >>>>> Based on the discussion some of these can be dropped, if those are
>> >> seen as
>> >>>>> redundant.
>> >>>>>
>> >>>>> Best Regards,
>> >>>>> Jukka
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>