You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Shen Li <cs...@gmail.com> on 2017/04/29 19:50:49 UTC

What's the easiest way for an application to convert an Iterable to an UnboundedSource

It seems that Create.of(Iterable) can only create a BoundedSource. Is there
a convenient way to read from an unbounded Iterable object without writing
application code to wrap it into an UnboundedSource object?


Thanks,

Shen

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Dan Halperin <dh...@google.com.INVALID>.
Hi Shen,

Most runners are expected to use `UnboundedReadFromBoundedSource` (in
`runners-core-construction`) to convert a BoundedSource to an
UnboundedSource if that is the semantics they need.

As Eugene says, I suspect you can also get similar behavior with a
SplittableDoFn.

Dan

On Sat, Apr 29, 2017 at 10:56 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Yes, sorry, TestStream (a bit early in the morning ;)).
>
> Thanks to have corrected me ;)
>
> Regards
> JB
>
>
> On 04/30/2017 07:17 AM, Eugene Kirpichov wrote:
>
>> Clarification: likely you meant TestStream
>> <https://github.com/apache/beam/blob/master/sdks/java/core/
>> src/main/java/org/apache/beam/sdk/testing/TestStream.java>
>> ?
>>
>> On Sat, Apr 29, 2017 at 10:03 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> Hi,
>>>
>>> In addition of Eugene and Jesse's answers, for testing purpose, it's
>>> possible to
>>> use CreateStream in some runners. It allows us to test streaming.
>>>
>>> Regards
>>> JB
>>>
>>> On 04/29/2017 09:50 PM, Shen Li wrote:
>>>
>>>> It seems that Create.of(Iterable) can only create a BoundedSource. Is
>>>>
>>> there
>>>
>>>> a convenient way to read from an unbounded Iterable object without
>>>>
>>> writing
>>>
>>>> application code to wrap it into an UnboundedSource object?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Shen
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Yes, sorry, TestStream (a bit early in the morning ;)).

Thanks to have corrected me ;)

Regards
JB

On 04/30/2017 07:17 AM, Eugene Kirpichov wrote:
> Clarification: likely you meant TestStream
> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java>
> ?
>
> On Sat, Apr 29, 2017 at 10:03 PM Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Hi,
>>
>> In addition of Eugene and Jesse's answers, for testing purpose, it's
>> possible to
>> use CreateStream in some runners. It allows us to test streaming.
>>
>> Regards
>> JB
>>
>> On 04/29/2017 09:50 PM, Shen Li wrote:
>>> It seems that Create.of(Iterable) can only create a BoundedSource. Is
>> there
>>> a convenient way to read from an unbounded Iterable object without
>> writing
>>> application code to wrap it into an UnboundedSource object?
>>>
>>>
>>> Thanks,
>>>
>>> Shen
>>>
>>
>> --
>> Jean-Baptiste Onofr�
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Clarification: likely you meant TestStream
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestStream.java>
?

On Sat, Apr 29, 2017 at 10:03 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi,
>
> In addition of Eugene and Jesse's answers, for testing purpose, it's
> possible to
> use CreateStream in some runners. It allows us to test streaming.
>
> Regards
> JB
>
> On 04/29/2017 09:50 PM, Shen Li wrote:
> > It seems that Create.of(Iterable) can only create a BoundedSource. Is
> there
> > a convenient way to read from an unbounded Iterable object without
> writing
> > application code to wrap it into an UnboundedSource object?
> >
> >
> > Thanks,
> >
> > Shen
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

In addition of Eugene and Jesse's answers, for testing purpose, it's possible to 
use CreateStream in some runners. It allows us to test streaming.

Regards
JB

On 04/29/2017 09:50 PM, Shen Li wrote:
> It seems that Create.of(Iterable) can only create a BoundedSource. Is there
> a convenient way to read from an unbounded Iterable object without writing
> application code to wrap it into an UnboundedSource object?
>
>
> Thanks,
>
> Shen
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Shen Li <cs...@gmail.com>.
Thanks Jesse :)

On Sat, Apr 29, 2017 at 4:40 PM, Jesse Anderson <je...@bigdatainstitute.io>
wrote:

> Here's some code that's similar to what you're asking for
> https://github.com/eljefe6a/beamexample/blob/master/
> BeamTutorial/src/main/java/org/apache/beam/examples/
> tutorial/game/injector/InjectorBoundedSource.java
>
> On Sat, Apr 29, 2017 at 1:23 PM Shen Li <cs...@gmail.com> wrote:
>
> > Thanks!
> >
> > Shen
> >
> > On Sat, Apr 29, 2017 at 4:08 PM, Eugene Kirpichov <
> > kirpichov@google.com.invalid> wrote:
> >
> > > Hi Shen,
> > >
> > > This is a very nice suggestion. Currently there is no way to do this,
> > > probably because nobody thought of this before, but here's a few
> thoughts
> > > anyway.
> > >
> > > - Both the Iterable and its Iterator will need to be Serializable,
> > because
> > > an UnboundedSource must be able to checkpoint and resume, to provide
> > fault
> > > tolerance in case the worker reading from it crashes. Do your iterables
> > > satisfy this constraint?
> > > - Reading will, of course, be sequential rather than parallel;
> processing
> > > can still be parallelized, though. I suppose that's fine for your use
> > case.
> > > - Once you have that - wrapping an UnboundedSource will be possible and
> > an
> > > interesting exercise. And, I believe, wrapping it with a splittable
> DoFn
> > > http://s.apache.org/splittable-do-fn will be much easier, though SDF
> > > support is yet inconsistent between runners (Direct works, Flink works,
> > > Apex and Dataflow in review). It'd actually be a good test case of the
> > ease
> > > of use of the API.
> > >
> > > On Sat, Apr 29, 2017 at 12:50 PM Shen Li <cs...@gmail.com> wrote:
> > >
> > > > It seems that Create.of(Iterable) can only create a BoundedSource. Is
> > > there
> > > > a convenient way to read from an unbounded Iterable object without
> > > writing
> > > > application code to wrap it into an UnboundedSource object?
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Shen
> > > >
> > >
> >
> --
> Thanks,
>
> Jesse
>

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Jesse Anderson <je...@bigdatainstitute.io>.
Here's some code that's similar to what you're asking for
https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/src/main/java/org/apache/beam/examples/tutorial/game/injector/InjectorBoundedSource.java

On Sat, Apr 29, 2017 at 1:23 PM Shen Li <cs...@gmail.com> wrote:

> Thanks!
>
> Shen
>
> On Sat, Apr 29, 2017 at 4:08 PM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
> > Hi Shen,
> >
> > This is a very nice suggestion. Currently there is no way to do this,
> > probably because nobody thought of this before, but here's a few thoughts
> > anyway.
> >
> > - Both the Iterable and its Iterator will need to be Serializable,
> because
> > an UnboundedSource must be able to checkpoint and resume, to provide
> fault
> > tolerance in case the worker reading from it crashes. Do your iterables
> > satisfy this constraint?
> > - Reading will, of course, be sequential rather than parallel; processing
> > can still be parallelized, though. I suppose that's fine for your use
> case.
> > - Once you have that - wrapping an UnboundedSource will be possible and
> an
> > interesting exercise. And, I believe, wrapping it with a splittable DoFn
> > http://s.apache.org/splittable-do-fn will be much easier, though SDF
> > support is yet inconsistent between runners (Direct works, Flink works,
> > Apex and Dataflow in review). It'd actually be a good test case of the
> ease
> > of use of the API.
> >
> > On Sat, Apr 29, 2017 at 12:50 PM Shen Li <cs...@gmail.com> wrote:
> >
> > > It seems that Create.of(Iterable) can only create a BoundedSource. Is
> > there
> > > a convenient way to read from an unbounded Iterable object without
> > writing
> > > application code to wrap it into an UnboundedSource object?
> > >
> > >
> > > Thanks,
> > >
> > > Shen
> > >
> >
>
-- 
Thanks,

Jesse

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Shen Li <cs...@gmail.com>.
Thanks!

Shen

On Sat, Apr 29, 2017 at 4:08 PM, Eugene Kirpichov <
kirpichov@google.com.invalid> wrote:

> Hi Shen,
>
> This is a very nice suggestion. Currently there is no way to do this,
> probably because nobody thought of this before, but here's a few thoughts
> anyway.
>
> - Both the Iterable and its Iterator will need to be Serializable, because
> an UnboundedSource must be able to checkpoint and resume, to provide fault
> tolerance in case the worker reading from it crashes. Do your iterables
> satisfy this constraint?
> - Reading will, of course, be sequential rather than parallel; processing
> can still be parallelized, though. I suppose that's fine for your use case.
> - Once you have that - wrapping an UnboundedSource will be possible and an
> interesting exercise. And, I believe, wrapping it with a splittable DoFn
> http://s.apache.org/splittable-do-fn will be much easier, though SDF
> support is yet inconsistent between runners (Direct works, Flink works,
> Apex and Dataflow in review). It'd actually be a good test case of the ease
> of use of the API.
>
> On Sat, Apr 29, 2017 at 12:50 PM Shen Li <cs...@gmail.com> wrote:
>
> > It seems that Create.of(Iterable) can only create a BoundedSource. Is
> there
> > a convenient way to read from an unbounded Iterable object without
> writing
> > application code to wrap it into an UnboundedSource object?
> >
> >
> > Thanks,
> >
> > Shen
> >
>

Re: What's the easiest way for an application to convert an Iterable to an UnboundedSource

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hi Shen,

This is a very nice suggestion. Currently there is no way to do this,
probably because nobody thought of this before, but here's a few thoughts
anyway.

- Both the Iterable and its Iterator will need to be Serializable, because
an UnboundedSource must be able to checkpoint and resume, to provide fault
tolerance in case the worker reading from it crashes. Do your iterables
satisfy this constraint?
- Reading will, of course, be sequential rather than parallel; processing
can still be parallelized, though. I suppose that's fine for your use case.
- Once you have that - wrapping an UnboundedSource will be possible and an
interesting exercise. And, I believe, wrapping it with a splittable DoFn
http://s.apache.org/splittable-do-fn will be much easier, though SDF
support is yet inconsistent between runners (Direct works, Flink works,
Apex and Dataflow in review). It'd actually be a good test case of the ease
of use of the API.

On Sat, Apr 29, 2017 at 12:50 PM Shen Li <cs...@gmail.com> wrote:

> It seems that Create.of(Iterable) can only create a BoundedSource. Is there
> a convenient way to read from an unbounded Iterable object without writing
> application code to wrap it into an UnboundedSource object?
>
>
> Thanks,
>
> Shen
>