You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Thomas Weise <th...@apache.org> on 2018/11/08 05:38:06 UTC

Kinesis consumer e2e test

Hi,

I'm planning to add an end-to-end test for the Kinesis consumer. We have
done something similar at Lyft, using Kinesalite, which can be run as
Docker container.

I see that some tests already make use of Docker, so we can assume it to be
present in the target environment(s)?

I also found the following ticket:
https://issues.apache.org/jira/browse/FLINK-9007

It suggest to also cover the producer, which may be a good way to create
the input data as well. The stream itself can be created with the Kinesis
Java SDK.

Following the existing layout, there would be a new module
flink-end-to-end-tests/flink-kinesis-test

Are there any suggestions or comments regarding this?

Thanks,
Thomas

Re: Kinesis consumer e2e test

Posted by Till Rohrmann <tr...@apache.org>.
Hi Thomas,

the community is really interested in adding an end-to-end test for the
Kinesis connector (producer as well as consumer). Thus, it would be really
helpful if you could contribute your work you've already done.

Using Kinesalite sounds good to me and you're right and that we assume that
Docker is available in our testing environment. The testing job would go
into a separate module as you've suggested
(flink-end-to-end-tests/flink-kinesis-test) and the entrypoint to the test
would go into flink-end-to-end-tests/test-scripts/ plus
flink-end-to-end-tests/run-nightly-tests.sh.

I think Stefan stopped his work on the end-to-end test but he had some
ideas about reusing testing infrastructure for the Kafka and Kinesis tests
(e.g. having a test base for similar connectors). This is something we can
also address after the release if it would entail too much work.

Cheers,
Till

On Thu, Nov 8, 2018 at 7:22 AM Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi Thomas,
>
> I think Stefan Richter is also working on the Kinesis end-to-end test, and
> seems to be planning to implement it against a real Kinesis service instead
> of Kinesalite.
> Perhaps efforts should be synced here.
>
> Cheers,
> Gordon
>
>
> On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise <th...@apache.org> wrote:
>
> > Hi,
> >
> > I'm planning to add an end-to-end test for the Kinesis consumer. We have
> > done something similar at Lyft, using Kinesalite, which can be run as
> > Docker container.
> >
> > I see that some tests already make use of Docker, so we can assume it to
> be
> > present in the target environment(s)?
> >
> > I also found the following ticket:
> > https://issues.apache.org/jira/browse/FLINK-9007
> >
> > It suggest to also cover the producer, which may be a good way to create
> > the input data as well. The stream itself can be created with the Kinesis
> > Java SDK.
> >
> > Following the existing layout, there would be a new module
> > flink-end-to-end-tests/flink-kinesis-test
> >
> > Are there any suggestions or comments regarding this?
> >
> > Thanks,
> > Thomas
> >
>

Re: Kinesis consumer e2e test

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

yes, that is correct. The failure mapper is there to cause a failover event for which we can then check i) that exactly-once or at-least-once is not violated, depending on the expected semantics and ii) that the restore works at all ;-). You might be able to reuse org.apache.flink.streaming.tests.FailureMapper for this. For the future, it would surely also be nice to have a test that covers rescaling as well, but for now just having any real test is already a great improvement.

Best,
Stefan

> On 12. Nov 2018, at 05:23, Thomas Weise <th...@apache.org> wrote:
> 
> Hi Stefan,
> 
> Thanks for the info. So if I understand correctly, the pipeline you had in
> mind is:
> 
> Consumer -> Map -> Producer
> 
> What do you expect as outcome of the mapper failure? That no records are
> lost but some possibly duplicated in the sink?
> 
> Regarding the abstraction, I will see what I can do in that regard. From
> where I start it may make more sense to do some of that as follow-up when
> the Kafka test is ported.
> 
> Thanks,
> Thomas
> 
> 
> On Thu, Nov 8, 2018 at 10:20 AM Stefan Richter <s....@data-artisans.com>
> wrote:
> 
>> Hi,
>> 
>> I was also just planning to work on it before Stephan contacted Thomas to
>> ask about this test.
>> 
>> Thomas, you are right about the structure, the test should also go into
>> the `run-nightly-tests.sh`. What I was planning to do is a simple job that
>> consists of a Kinesis consumer, a mapper that fails once after n records,
>> and a kinesis producer. I was hoping that creation, filling, and validation
>> of the Kinesis topics can be done with the Java API, not by invoking
>> commands in a bash script. In general I would try to minimise the amount of
>> scripting and do as much in Java as possible. It would also be nice if the
>> test was generalised, e.g. that abstract Producer/Consumer are created from
>> a Supplier and also the validation is done over some abstraction that lets
>> us iterate over the produced output. Ideally, this would be a test that we
>> can reuse for all Consumer/Producer cases and we could also port the tests
>> for Kafka to that. What do you think?
>> 
>> Best,
>> Stefan
>> 
>>> On 8. Nov 2018, at 07:22, Tzu-Li (Gordon) Tai <tz...@apache.org>
>> wrote:
>>> 
>>> Hi Thomas,
>>> 
>>> I think Stefan Richter is also working on the Kinesis end-to-end test,
>> and
>>> seems to be planning to implement it against a real Kinesis service
>> instead
>>> of Kinesalite.
>>> Perhaps efforts should be synced here.
>>> 
>>> Cheers,
>>> Gordon
>>> 
>>> 
>>> On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise <th...@apache.org> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm planning to add an end-to-end test for the Kinesis consumer. We have
>>>> done something similar at Lyft, using Kinesalite, which can be run as
>>>> Docker container.
>>>> 
>>>> I see that some tests already make use of Docker, so we can assume it
>> to be
>>>> present in the target environment(s)?
>>>> 
>>>> I also found the following ticket:
>>>> https://issues.apache.org/jira/browse/FLINK-9007
>>>> 
>>>> It suggest to also cover the producer, which may be a good way to create
>>>> the input data as well. The stream itself can be created with the
>> Kinesis
>>>> Java SDK.
>>>> 
>>>> Following the existing layout, there would be a new module
>>>> flink-end-to-end-tests/flink-kinesis-test
>>>> 
>>>> Are there any suggestions or comments regarding this?
>>>> 
>>>> Thanks,
>>>> Thomas
>>>> 
>> 
>> 


Re: Kinesis consumer e2e test

Posted by Thomas Weise <th...@apache.org>.
Hi Stefan,

Thanks for the info. So if I understand correctly, the pipeline you had in
mind is:

Consumer -> Map -> Producer

What do you expect as outcome of the mapper failure? That no records are
lost but some possibly duplicated in the sink?

Regarding the abstraction, I will see what I can do in that regard. From
where I start it may make more sense to do some of that as follow-up when
the Kafka test is ported.

Thanks,
Thomas


On Thu, Nov 8, 2018 at 10:20 AM Stefan Richter <s....@data-artisans.com>
wrote:

> Hi,
>
> I was also just planning to work on it before Stephan contacted Thomas to
> ask about this test.
>
> Thomas, you are right about the structure, the test should also go into
> the `run-nightly-tests.sh`. What I was planning to do is a simple job that
> consists of a Kinesis consumer, a mapper that fails once after n records,
> and a kinesis producer. I was hoping that creation, filling, and validation
> of the Kinesis topics can be done with the Java API, not by invoking
> commands in a bash script. In general I would try to minimise the amount of
> scripting and do as much in Java as possible. It would also be nice if the
> test was generalised, e.g. that abstract Producer/Consumer are created from
> a Supplier and also the validation is done over some abstraction that lets
> us iterate over the produced output. Ideally, this would be a test that we
> can reuse for all Consumer/Producer cases and we could also port the tests
> for Kafka to that. What do you think?
>
> Best,
> Stefan
>
> > On 8. Nov 2018, at 07:22, Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
> >
> > Hi Thomas,
> >
> > I think Stefan Richter is also working on the Kinesis end-to-end test,
> and
> > seems to be planning to implement it against a real Kinesis service
> instead
> > of Kinesalite.
> > Perhaps efforts should be synced here.
> >
> > Cheers,
> > Gordon
> >
> >
> > On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise <th...@apache.org> wrote:
> >
> >> Hi,
> >>
> >> I'm planning to add an end-to-end test for the Kinesis consumer. We have
> >> done something similar at Lyft, using Kinesalite, which can be run as
> >> Docker container.
> >>
> >> I see that some tests already make use of Docker, so we can assume it
> to be
> >> present in the target environment(s)?
> >>
> >> I also found the following ticket:
> >> https://issues.apache.org/jira/browse/FLINK-9007
> >>
> >> It suggest to also cover the producer, which may be a good way to create
> >> the input data as well. The stream itself can be created with the
> Kinesis
> >> Java SDK.
> >>
> >> Following the existing layout, there would be a new module
> >> flink-end-to-end-tests/flink-kinesis-test
> >>
> >> Are there any suggestions or comments regarding this?
> >>
> >> Thanks,
> >> Thomas
> >>
>
>

Re: Kinesis consumer e2e test

Posted by Stefan Richter <s....@data-artisans.com>.
Hi,

I was also just planning to work on it before Stephan contacted Thomas to ask about this test.

Thomas, you are right about the structure, the test should also go into the `run-nightly-tests.sh`. What I was planning to do is a simple job that consists of a Kinesis consumer, a mapper that fails once after n records, and a kinesis producer. I was hoping that creation, filling, and validation of the Kinesis topics can be done with the Java API, not by invoking commands in a bash script. In general I would try to minimise the amount of scripting and do as much in Java as possible. It would also be nice if the test was generalised, e.g. that abstract Producer/Consumer are created from a Supplier and also the validation is done over some abstraction that lets us iterate over the produced output. Ideally, this would be a test that we can reuse for all Consumer/Producer cases and we could also port the tests for Kafka to that. What do you think?

Best,
Stefan

> On 8. Nov 2018, at 07:22, Tzu-Li (Gordon) Tai <tz...@apache.org> wrote:
> 
> Hi Thomas,
> 
> I think Stefan Richter is also working on the Kinesis end-to-end test, and
> seems to be planning to implement it against a real Kinesis service instead
> of Kinesalite.
> Perhaps efforts should be synced here.
> 
> Cheers,
> Gordon
> 
> 
> On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise <th...@apache.org> wrote:
> 
>> Hi,
>> 
>> I'm planning to add an end-to-end test for the Kinesis consumer. We have
>> done something similar at Lyft, using Kinesalite, which can be run as
>> Docker container.
>> 
>> I see that some tests already make use of Docker, so we can assume it to be
>> present in the target environment(s)?
>> 
>> I also found the following ticket:
>> https://issues.apache.org/jira/browse/FLINK-9007
>> 
>> It suggest to also cover the producer, which may be a good way to create
>> the input data as well. The stream itself can be created with the Kinesis
>> Java SDK.
>> 
>> Following the existing layout, there would be a new module
>> flink-end-to-end-tests/flink-kinesis-test
>> 
>> Are there any suggestions or comments regarding this?
>> 
>> Thanks,
>> Thomas
>> 


Re: Kinesis consumer e2e test

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Thomas,

I think Stefan Richter is also working on the Kinesis end-to-end test, and
seems to be planning to implement it against a real Kinesis service instead
of Kinesalite.
Perhaps efforts should be synced here.

Cheers,
Gordon


On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise <th...@apache.org> wrote:

> Hi,
>
> I'm planning to add an end-to-end test for the Kinesis consumer. We have
> done something similar at Lyft, using Kinesalite, which can be run as
> Docker container.
>
> I see that some tests already make use of Docker, so we can assume it to be
> present in the target environment(s)?
>
> I also found the following ticket:
> https://issues.apache.org/jira/browse/FLINK-9007
>
> It suggest to also cover the producer, which may be a good way to create
> the input data as well. The stream itself can be created with the Kinesis
> Java SDK.
>
> Following the existing layout, there would be a new module
> flink-end-to-end-tests/flink-kinesis-test
>
> Are there any suggestions or comments regarding this?
>
> Thanks,
> Thomas
>