You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Thomas Weise <th...@apache.org> on 2017/09/05 16:37:37 UTC

Generating data stream for testing

Hi,

I'm looking for the suitable starting point to create an unbounded source
for testing of a pipeline. The source should generate data (let's say
KV<String, Integer>), but can also inject watermarks.

I see couple implementations like TestCountingSource used for runner
testing, is the starting point for users UnboundedSource?

Thanks,
Thomas

Re: Generating data stream for testing

Posted by Etienne Chauchot <ec...@gmail.com>.
Hi Thomas,

There is a generator in Nexmark that generates events to be used in both 
batch and streaming:

See the generator: 
https://github.com/apache/beam/blob/master/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/sources/Generator.java

See the source creation: 
https://github.com/apache/beam/blob/master/sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/NexmarkLauncher.java

Best

Etienne


Le 06/09/2017 à 23:18, Thomas Weise a écrit :
> Hi Eugene,
>
> TestStream is great for functional testing. I was looking for a way to
> continuously generate data instead of specifying it upfront as collection,
> hence my question regarding the UnboundedSource hierarchy.
>
> Thomas
>
>
> On Tue, Sep 5, 2017 at 10:09 AM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
>> Hi, did you look at TestStream?
>>
>> On Tue, Sep 5, 2017, 9:37 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> I'm looking for the suitable starting point to create an unbounded source
>>> for testing of a pipeline. The source should generate data (let's say
>>> KV<String, Integer>), but can also inject watermarks.
>>>
>>> I see couple implementations like TestCountingSource used for runner
>>> testing, is the starting point for users UnboundedSource?
>>>
>>> Thanks,
>>> Thomas
>>>


Re: Generating data stream for testing

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
You may find GenerateSequence can serve your needs:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/GenerateSequence.java

On Wed, Sep 6, 2017 at 2:18 PM, Thomas Weise <th...@apache.org> wrote:

> Hi Eugene,
>
> TestStream is great for functional testing. I was looking for a way to
> continuously generate data instead of specifying it upfront as collection,
> hence my question regarding the UnboundedSource hierarchy.
>
> Thomas
>
>
> On Tue, Sep 5, 2017 at 10:09 AM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
> > Hi, did you look at TestStream?
> >
> > On Tue, Sep 5, 2017, 9:37 AM Thomas Weise <th...@apache.org> wrote:
> >
> > > Hi,
> > >
> > > I'm looking for the suitable starting point to create an unbounded
> source
> > > for testing of a pipeline. The source should generate data (let's say
> > > KV<String, Integer>), but can also inject watermarks.
> > >
> > > I see couple implementations like TestCountingSource used for runner
> > > testing, is the starting point for users UnboundedSource?
> > >
> > > Thanks,
> > > Thomas
> > >
> >
>

Re: Generating data stream for testing

Posted by Thomas Weise <th...@apache.org>.
Hi Eugene,

TestStream is great for functional testing. I was looking for a way to
continuously generate data instead of specifying it upfront as collection,
hence my question regarding the UnboundedSource hierarchy.

Thomas


On Tue, Sep 5, 2017 at 10:09 AM, Eugene Kirpichov <
kirpichov@google.com.invalid> wrote:

> Hi, did you look at TestStream?
>
> On Tue, Sep 5, 2017, 9:37 AM Thomas Weise <th...@apache.org> wrote:
>
> > Hi,
> >
> > I'm looking for the suitable starting point to create an unbounded source
> > for testing of a pipeline. The source should generate data (let's say
> > KV<String, Integer>), but can also inject watermarks.
> >
> > I see couple implementations like TestCountingSource used for runner
> > testing, is the starting point for users UnboundedSource?
> >
> > Thanks,
> > Thomas
> >
>

Re: Generating data stream for testing

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hi, did you look at TestStream?

On Tue, Sep 5, 2017, 9:37 AM Thomas Weise <th...@apache.org> wrote:

> Hi,
>
> I'm looking for the suitable starting point to create an unbounded source
> for testing of a pipeline. The source should generate data (let's say
> KV<String, Integer>), but can also inject watermarks.
>
> I see couple implementations like TestCountingSource used for runner
> testing, is the starting point for users UnboundedSource?
>
> Thanks,
> Thomas
>