You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by David Jost <da...@uniberg.com> on 2022/07/04 08:23:29 UTC

How to mock new DataSource/Sink

Hi,

we are currently looking at replacing our sinks and sources with the respective counterparts using the 'new' data source/sink API (mainly Kafka). What holds us back is that we are not sure how to test the pipeline with mocked sources/sinks. Up till now, we somewhat followed the 'Testing docs'[0] and created a simple SinkFunction, as well as a ParallelSourceFunction, where we could get data in and out at our leisure. They could be easily plugged into the pipeline for the tests. But with the new API, it seems way too cumbersome to go such an approach, as there is a lot of overhead in creating a sink or source on your own (now).

I would love to know, what the intended or recommended way is here. I know, that I can still use the old API, but that a) feels wrong, and b) requires us to expose the DataStream, which is not necessary in the current setup.

I appreciate any ideas or even examples on this.

Thank you in advance.

Best
  David


[0]: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/testing/#testing-flink-jobs

Re: How to mock new DataSource/Sink

Posted by David Jost <da...@uniberg.com>.
> On 5. Jul 2022, at 01:48, Alexander Fedulov <al...@ververica.com> wrote:
> 
> Hi David,
> 
> I started working on FLIP-238 exactly with the concerns you've mentioned in mind. It is currently in development, feel free to join the discussion [1]. If you need something ASAP and are not interested in rate-limiting functionality, you could drop in this [2] class into your tests suite (this version is standalone and does not require changes in any other classes). The usage is as indicated in the FLIP (minus sourceRatePerSecond parameter) [3].
> 
> [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
> [2] https://github.com/afedulov/flink/blob/FLINK-27919-generator-source/flink-core/src/main/java/org/apache/flink/api/connector/source/lib/DataGeneratorSourceV0.java
> [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-238%3A+Introduce+FLIP-27-based+Data+Generator+Source#:~:text=%7D-,Usage%3A%C2%A0,-The%20envisioned%20usage
> 
> Best,
> Alexander Fedulov


> On Mon, Jul 4, 2022 at 12:51 PM Chesnay Schepler <ch...@apache.org> wrote:
> It is indeed not easy to mock sources/sink with the new interfaces.
> 
> There is an effort to make this easier for sources in the future (FLIP-238).
> 
> For the time being I'd stick with the old APIs for mock sources/sinks.
> 


Hi Chesnay and Alexander,

thank you both for your answers and pointing me to FLIP-238. I will definitively have a look at your DataGeneratorSource, Alexander; thank you. Then I will only need a sink to go with it. :) But knowing what the state is helps tremendously with our decisions.

Thank you again for you help and have a great day.

Best
  David

Re: How to mock new DataSource/Sink

Posted by Alexander Fedulov <al...@ververica.com>.
Hi David,

I started working on FLIP-238 exactly with the concerns you've mentioned in
mind. It is currently in development, feel free to join the discussion [1].
If you need something ASAP and are not interested in rate-limiting
functionality, you could drop in this [2] class into your tests suite (this
version is standalone and does not require changes in any other classes).
The usage is as indicated in the FLIP (minus sourceRatePerSecond parameter)
[3].

[1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
[2]
https://github.com/afedulov/flink/blob/FLINK-27919-generator-source/flink-core/src/main/java/org/apache/flink/api/connector/source/lib/DataGeneratorSourceV0.java
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-238%3A+Introduce+FLIP-27-based+Data+Generator+Source#:~:text=%7D-,Usage%3A%C2%A0,-The%20envisioned%20usage

Best,
Alexander Fedulov


On Mon, Jul 4, 2022 at 12:51 PM Chesnay Schepler <ch...@apache.org> wrote:

> It is indeed not easy to mock sources/sink with the new interfaces.
>
> There is an effort to make this easier for sources in the future (FLIP-238
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-238%3A+Introduce+FLIP-27-based+Data+Generator+Source>
> ).
>
> For the time being I'd stick with the old APIs for mock sources/sinks.
>
> On 04/07/2022 10:23, David Jost wrote:
>
> Hi,
>
> we are currently looking at replacing our sinks and sources with the respective counterparts using the 'new' data source/sink API (mainly Kafka). What holds us back is that we are not sure how to test the pipeline with mocked sources/sinks. Up till now, we somewhat followed the 'Testing docs'[0] and created a simple SinkFunction, as well as a ParallelSourceFunction, where we could get data in and out at our leisure. They could be easily plugged into the pipeline for the tests. But with the new API, it seems way too cumbersome to go such an approach, as there is a lot of overhead in creating a sink or source on your own (now).
>
> I would love to know, what the intended or recommended way is here. I know, that I can still use the old API, but that a) feels wrong, and b) requires us to expose the DataStream, which is not necessary in the current setup.
>
> I appreciate any ideas or even examples on this.
>
> Thank you in advance.
>
> Best
>   David
>
>
> [0]: https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/testing/#testing-flink-jobs
>
>
>

Re: How to mock new DataSource/Sink

Posted by Chesnay Schepler <ch...@apache.org>.
It is indeed not easy to mock sources/sink with the new interfaces.

There is an effort to make this easier for sources in the future 
(FLIP-238 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-238%3A+Introduce+FLIP-27-based+Data+Generator+Source>).

For the time being I'd stick with the old APIs for mock sources/sinks.

On 04/07/2022 10:23, David Jost wrote:
> Hi,
>
> we are currently looking at replacing our sinks and sources with the respective counterparts using the 'new' data source/sink API (mainly Kafka). What holds us back is that we are not sure how to test the pipeline with mocked sources/sinks. Up till now, we somewhat followed the 'Testing docs'[0] and created a simple SinkFunction, as well as a ParallelSourceFunction, where we could get data in and out at our leisure. They could be easily plugged into the pipeline for the tests. But with the new API, it seems way too cumbersome to go such an approach, as there is a lot of overhead in creating a sink or source on your own (now).
>
> I would love to know, what the intended or recommended way is here. I know, that I can still use the old API, but that a) feels wrong, and b) requires us to expose the DataStream, which is not necessary in the current setup.
>
> I appreciate any ideas or even examples on this.
>
> Thank you in advance.
>
> Best
>    David
>
>
> [0]:https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/testing/#testing-flink-jobs