You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Eugene Kirpichov <ki...@google.com> on 2018/06/26 00:12:52 UTC

Re: Unbounded source translation for portable pipelines

Hi!

Wanted to let you know that I've just merged the PR that adds
checkpointable SDF support to the portable reference runner (ULR) and the
Java SDK harness:

https://github.com/apache/beam/pull/5566

So now we have a reference implementation of SDF support in a portable
runner, and a reference implementation of SDF support in a portable SDK
harness.
From here on, we need to replicate this support in other portable runners
and other harnesses. The obvious targets are Flink and Python respectively.

Chamikara was going to work on the Python harness. +Thomas Weise
<th...@apache.org> Would you be interested in the Flink portable streaming
runner side? It is of course blocked by having the rest of that runner
working in streaming mode though (the batch mode is practically done - will
send you a separate note about the status of that).

On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Luke is right - unbounded sources should go through SDF. I am currently
> working on adding such support to Fn API.
> The relevant document is s.apache.org/beam-breaking-fusion (note: it
> focuses on a much more general case, but also considers in detail the
> specific case of running unbounded sources on Fn API), and the first
> related PR is https://github.com/apache/beam/pull/4743 .
>
> Ways you can help speed up this effort:
> - Make necessary changes to Apex runner per se to support regular SDFs in
> streaming (without portability). They will likely largely carry over to
> portable world. I recall that the Apex runner had some level of support of
> SDFs, but didn't pass the ValidatesRunner tests yet.
> - (general to Beam, not Apex-related per se) Implement the translation of
> Read.from(UnboundedSource) via impulse, which will require implementing an
> SDF that reads from a given UnboundedSource (taking the UnboundedSource as
> an element). This should be fairly straightforward and will allow all
> portable runners to take advantage of existing UnboundedSource's.
>
>
> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> Using impulse is a precursor for both bounded and unbounded SDF.
>>
>> This JIRA represents the work that would be to add support for unbounded
>> SDF using portability APIs:
>> https://issues.apache.org/jira/browse/BEAM-2939
>>
>>
>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> So for streaming, we will need the Impulse translation for bounded
>>> input, identical with batch, and then in addition to that support for SDF?
>>>
>>> Any pointers what's involved in adding the SDF support? Is it runner
>>> specific? Does the ULR cover it?
>>>
>>>
>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> All "sources" in portability will use splittable DoFns for execution.
>>>>
>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>> sources to get a minimum viable pipeline working.
>>>> For bounded pipelines, a DoFn can read the contents of a bounded source.
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>
>>>>> What is the intended path to support unbounded sources?
>>>>>
>>>>> The goal here is to get a minimum translation working that will allow
>>>>> streaming wordcount execution.
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>

Re: Unbounded source translation for portable pipelines

Posted by Lukasz Cwik <lc...@google.com>.
It would be great to have the ValidatesRunner suite of tests start
executing against Flink/ULR as it will make sure things don't break and are
reproducible.

On Wed, Jun 27, 2018 at 12:34 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Hi!
>
> Those instructions are not current and I think should be discarded as they
> referred to a particular effort that is over - +Ankur Goenka
> <go...@google.com> is, I believe, working on the remaining finishing
> touches for running from a clean clone of Beam master and documenting how
> to do that; could you help Thomas so we can start looking at what the
> streaming runner is missing?
>
> We'll need to document this in a more prominent place. When we get to a
> state where we can run Python WordCount from master, we'll need to document
> it somewhere on the main portability page and/or the getting started guide;
> when we can run something more serious, e.g. Tensorflow pipelines, that
> will be worth a Beam blog post and worth documenting in the TFX
> documentation.
>
> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <th...@apache.org> wrote:
>
>> Hi Eugene,
>>
>> The basic streaming translation is already in place from the prototype,
>> though I have not verified it on the master branch yet.
>>
>> Are the user instructions for the portable Flink runner at
>> https://s.apache.org/beam-portability-team-doc current?
>>
>> (I don't have a dependency on SDF since we are going to use custom native
>> Flink sources/sinks at this time.)
>>
>> Thanks,
>> Thomas
>>
>>
>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Hi!
>>>
>>> Wanted to let you know that I've just merged the PR that adds
>>> checkpointable SDF support to the portable reference runner (ULR) and the
>>> Java SDK harness:
>>>
>>> https://github.com/apache/beam/pull/5566
>>>
>>> So now we have a reference implementation of SDF support in a portable
>>> runner, and a reference implementation of SDF support in a portable SDK
>>> harness.
>>> From here on, we need to replicate this support in other portable
>>> runners and other harnesses. The obvious targets are Flink and Python
>>> respectively.
>>>
>>> Chamikara was going to work on the Python harness. +Thomas Weise
>>> <th...@apache.org> Would you be interested in the Flink portable
>>> streaming runner side? It is of course blocked by having the rest of that
>>> runner working in streaming mode though (the batch mode is practically done
>>> - will send you a separate note about the status of that).
>>>
>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Luke is right - unbounded sources should go through SDF. I am currently
>>>> working on adding such support to Fn API.
>>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>>>> focuses on a much more general case, but also considers in detail the
>>>> specific case of running unbounded sources on Fn API), and the first
>>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>>
>>>> Ways you can help speed up this effort:
>>>> - Make necessary changes to Apex runner per se to support regular SDFs
>>>> in streaming (without portability). They will likely largely carry over to
>>>> portable world. I recall that the Apex runner had some level of support of
>>>> SDFs, but didn't pass the ValidatesRunner tests yet.
>>>> - (general to Beam, not Apex-related per se) Implement the translation
>>>> of Read.from(UnboundedSource) via impulse, which will require implementing
>>>> an SDF that reads from a given UnboundedSource (taking the UnboundedSource
>>>> as an element). This should be fairly straightforward and will allow all
>>>> portable runners to take advantage of existing UnboundedSource's.
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>>
>>>>> This JIRA represents the work that would be to add support for
>>>>> unbounded SDF using portability APIs:
>>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>>>
>>>>>> Any pointers what's involved in adding the SDF support? Is it runner
>>>>>> specific? Does the ULR cover it?
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> All "sources" in portability will use splittable DoFns for execution.
>>>>>>>
>>>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>>>> sources to get a minimum viable pipeline working.
>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>>> source.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>>>>
>>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>>
>>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>>> allow streaming wordcount execution.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>

Re: Portable pipelines on Flink

Posted by Thomas Weise <th...@apache.org>.
On Wed, Jul 4, 2018 at 10:06 AM Thomas Weise <th...@apache.org> wrote:

> * The gradle task to run job server takes really long to start the server
> (> 1 minute on my macbook). Every restart appears to occur the same
> penalty, regardless of whether there are changes or not. Following can be
> used to just start the server:
>
>  java -jar
> ./runners/flink/job-server/build/install/beam-runners-flink_2.11-job-server-shadow/lib/beam-runners-flink_2.11-job-server-2.6.0-SNAPSHOT-shaded.jar
> "--job-host=localhost:8099" "--artifacts-dir=/tmp/flink-artifacts"
>
>
Upon further inspection, the gradle task does the right thing. Once the
server is built, a control-C and restart will take only a few seconds.

I was able to get the basic streaming translation work for a simple example
pipeline [1] on top of changes in [2]

Note that currently we are lacking the capability for the runner to
retrieve the pipeline options (they are dropped by the Python client). IMO
that is something that needs to be addressed for Flink MVP, please see
related discussion on the PR.

[1] https://gist.github.com/tweise/5b4158fd040a263ddd6bdb885ffc0f3f
[2] https://github.com/apache/beam/pull/5888

Re: Portable pipelines on Flink

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Jul 4, 2018 at 1:06 AM Thomas Weise <th...@apache.org> wrote:

> [subject change for discussion fork]
>
> Thanks for the steps. I'm able to run the Python wordcount example, though
> it fails with local file output. Did you test with distributed FS or local
> FS?
>

It doesn't work with local FS because the code running in docker doesn't
have access to the local FS. It would be good to automatically support this
somehow (with mount points?) but exactly what to do here is non-trivial.

Re: Portable pipelines on Flink

Posted by Thomas Weise <th...@apache.org>.
>
>
>
> * Job server does not currently provide log output.
>

It is currently configured with the simple logger backend, so to get
console output just add  -Dorg.slf4j.simpleLogger.defaultLogLevel=debug to
the command line.

Portable pipelines on Flink

Posted by Thomas Weise <th...@apache.org>.
[subject change for discussion fork]

Thanks for the steps. I'm able to run the Python wordcount example, though
it fails with local file output. Did you test with distributed FS or local
FS?

Some more observations so far:

* Job server does not currently provide log output.
* The gradle task to run job server takes really long to start the server
(> 1 minute on my macbook). Every restart appears to occur the same
penalty, regardless of whether there are changes or not. Following can be
used to just start the server:

 java -jar
./runners/flink/job-server/build/install/beam-runners-flink_2.11-job-server-shadow/lib/beam-runners-flink_2.11-job-server-2.6.0-SNAPSHOT-shaded.jar
"--job-host=localhost:8099" "--artifacts-dir=/tmp/flink-artifacts"

* The gradle command to run wordcount (./gradlew
:beam-sdks-python:portableWordCount) does not succeed in my macbook
environment. Following error:
https://gist.github.com/tweise/33039eb288161528b859885d06f3ebf3 This goes
away when I setup virtualenv manually and run:

python -m apache_beam.examples.wordcount --input /etc/profile --output
/tmp/py-wordcount-direct --experiments=beam_fn_api --runner=PortableRunner
--job_endpoint=localhost:8099 --sdk_location=container

Thanks,
Thomas

On Tue, Jul 3, 2018 at 1:39 AM Eugene Kirpichov <ki...@google.com>
wrote:

> Updated the Flink section.
>
> To run a basic Python wordcount (sent to you in a separate thread, but
> repeating here too for others to play with):
>
> Step 1: Run once to build a container: "./gradlew -p sdks/python/container
> docker"
> Step 2: ./gradlew :beam-runners-flink_2.11-job-server:runShadow - this
> starts up a local Flink portable JobService endpoint on localhost:8099
> Step 3: run things using PortableRunner pointed at this endpoint - see
> e.g. https://github.com/apache/beam/pull/5824/files
>
> On Thu, Jun 28, 2018 at 1:37 AM Thomas Weise <th...@apache.org> wrote:
>
>> Ankur/Eugene,
>>
>> When you have a chance, please also update the Flink section of:
>> https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0
>>
>> Thanks!
>>
>> On Thu, Jun 28, 2018 at 10:29 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> The command to run the job server appears to be: ./gradlew -p
>>> runners/flink/job-server runShadow
>>>
>>> Can you please provide the equivalent of the super basic Python example
>>> from the prototype:
>>>
>>>
>>> https://github.com/bsidhom/beam/blob/hacking-job-server/sdks/python/flink-example.py
>>>
>>> Looks as if the Python side runner changed:
>>>
>>> Traceback (most recent call last):
>>>   File "flink-example.py", line 7, in <module>
>>>     from apache_beam.runners.portability import universal_local_runner
>>> ImportError: cannot import name universal_local_runner
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Wed, Jun 27, 2018 at 9:34 PM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> Those instructions are not current and I think should be discarded as
>>>> they referred to a particular effort that is over - +Ankur Goenka
>>>> <go...@google.com> is, I believe, working on the remaining finishing
>>>> touches for running from a clean clone of Beam master and documenting how
>>>> to do that; could you help Thomas so we can start looking at what the
>>>> streaming runner is missing?
>>>>
>>>> We'll need to document this in a more prominent place. When we get to a
>>>> state where we can run Python WordCount from master, we'll need to document
>>>> it somewhere on the main portability page and/or the getting started guide;
>>>> when we can run something more serious, e.g. Tensorflow pipelines, that
>>>> will be worth a Beam blog post and worth documenting in the TFX
>>>> documentation.
>>>>
>>>> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> Hi Eugene,
>>>>>
>>>>> The basic streaming translation is already in place from the
>>>>> prototype, though I have not verified it on the master branch yet.
>>>>>
>>>>> Are the user instructions for the portable Flink runner at
>>>>> https://s.apache.org/beam-portability-team-doc current?
>>>>>
>>>>> (I don't have a dependency on SDF since we are going to use custom
>>>>> native Flink sources/sinks at this time.)
>>>>>
>>>>> Thanks,
>>>>> Thomas
>>>>>
>>>>>
>>>>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> Wanted to let you know that I've just merged the PR that adds
>>>>>> checkpointable SDF support to the portable reference runner (ULR) and the
>>>>>> Java SDK harness:
>>>>>>
>>>>>> https://github.com/apache/beam/pull/5566
>>>>>>
>>>>>> So now we have a reference implementation of SDF support in a
>>>>>> portable runner, and a reference implementation of SDF support in a
>>>>>> portable SDK harness.
>>>>>> From here on, we need to replicate this support in other portable
>>>>>> runners and other harnesses. The obvious targets are Flink and Python
>>>>>> respectively.
>>>>>>
>>>>>> Chamikara was going to work on the Python harness. +Thomas Weise
>>>>>> <th...@apache.org> Would you be interested in the Flink portable
>>>>>> streaming runner side? It is of course blocked by having the rest of that
>>>>>> runner working in streaming mode though (the batch mode is practically done
>>>>>> - will send you a separate note about the status of that).
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <
>>>>>> kirpichov@google.com> wrote:
>>>>>>
>>>>>>> Luke is right - unbounded sources should go through SDF. I am
>>>>>>> currently working on adding such support to Fn API.
>>>>>>> The relevant document is s.apache.org/beam-breaking-fusion (note:
>>>>>>> it focuses on a much more general case, but also considers in detail the
>>>>>>> specific case of running unbounded sources on Fn API), and the first
>>>>>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>>>>>
>>>>>>> Ways you can help speed up this effort:
>>>>>>> - Make necessary changes to Apex runner per se to support regular
>>>>>>> SDFs in streaming (without portability). They will likely largely carry
>>>>>>> over to portable world. I recall that the Apex runner had some level of
>>>>>>> support of SDFs, but didn't pass the ValidatesRunner tests yet.
>>>>>>> - (general to Beam, not Apex-related per se) Implement the
>>>>>>> translation of Read.from(UnboundedSource) via impulse, which will require
>>>>>>> implementing an SDF that reads from a given UnboundedSource (taking the
>>>>>>> UnboundedSource as an element). This should be fairly straightforward and
>>>>>>> will allow all portable runners to take advantage of existing
>>>>>>> UnboundedSource's.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>>>>>
>>>>>>>> This JIRA represents the work that would be to add support for
>>>>>>>> unbounded SDF using portability APIs:
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>>>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>>>>>>
>>>>>>>>> Any pointers what's involved in adding the SDF support? Is it
>>>>>>>>> runner specific? Does the ULR cover it?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> All "sources" in portability will use splittable DoFns for
>>>>>>>>>> execution.
>>>>>>>>>>
>>>>>>>>>> Specifically, runners will need to be able to checkpoint
>>>>>>>>>> unbounded sources to get a minimum viable pipeline working.
>>>>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>>>>>> source.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I'm looking at the portable pipeline translation for streaming.
>>>>>>>>>>> I understand that for batch pipelines, it is sufficient to translate
>>>>>>>>>>> Impulse.
>>>>>>>>>>>
>>>>>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>>>>>
>>>>>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>>>>>> allow streaming wordcount execution.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>

Re: Unbounded source translation for portable pipelines

Posted by Eugene Kirpichov <ki...@google.com>.
Updated the Flink section.

To run a basic Python wordcount (sent to you in a separate thread, but
repeating here too for others to play with):

Step 1: Run once to build a container: "./gradlew -p sdks/python/container
docker"
Step 2: ./gradlew :beam-runners-flink_2.11-job-server:runShadow - this
starts up a local Flink portable JobService endpoint on localhost:8099
Step 3: run things using PortableRunner pointed at this endpoint - see e.g.
https://github.com/apache/beam/pull/5824/files

On Thu, Jun 28, 2018 at 1:37 AM Thomas Weise <th...@apache.org> wrote:

> Ankur/Eugene,
>
> When you have a chance, please also update the Flink section of:
> https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0
>
> Thanks!
>
> On Thu, Jun 28, 2018 at 10:29 AM Thomas Weise <th...@apache.org> wrote:
>
>> The command to run the job server appears to be: ./gradlew -p
>> runners/flink/job-server runShadow
>>
>> Can you please provide the equivalent of the super basic Python example
>> from the prototype:
>>
>>
>> https://github.com/bsidhom/beam/blob/hacking-job-server/sdks/python/flink-example.py
>>
>> Looks as if the Python side runner changed:
>>
>> Traceback (most recent call last):
>>   File "flink-example.py", line 7, in <module>
>>     from apache_beam.runners.portability import universal_local_runner
>> ImportError: cannot import name universal_local_runner
>>
>> Thanks,
>> Thomas
>>
>>
>> On Wed, Jun 27, 2018 at 9:34 PM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Hi!
>>>
>>> Those instructions are not current and I think should be discarded as
>>> they referred to a particular effort that is over - +Ankur Goenka
>>> <go...@google.com> is, I believe, working on the remaining finishing
>>> touches for running from a clean clone of Beam master and documenting how
>>> to do that; could you help Thomas so we can start looking at what the
>>> streaming runner is missing?
>>>
>>> We'll need to document this in a more prominent place. When we get to a
>>> state where we can run Python WordCount from master, we'll need to document
>>> it somewhere on the main portability page and/or the getting started guide;
>>> when we can run something more serious, e.g. Tensorflow pipelines, that
>>> will be worth a Beam blog post and worth documenting in the TFX
>>> documentation.
>>>
>>> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> Hi Eugene,
>>>>
>>>> The basic streaming translation is already in place from the prototype,
>>>> though I have not verified it on the master branch yet.
>>>>
>>>> Are the user instructions for the portable Flink runner at
>>>> https://s.apache.org/beam-portability-team-doc current?
>>>>
>>>> (I don't have a dependency on SDF since we are going to use custom
>>>> native Flink sources/sinks at this time.)
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
>>>> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> Wanted to let you know that I've just merged the PR that adds
>>>>> checkpointable SDF support to the portable reference runner (ULR) and the
>>>>> Java SDK harness:
>>>>>
>>>>> https://github.com/apache/beam/pull/5566
>>>>>
>>>>> So now we have a reference implementation of SDF support in a portable
>>>>> runner, and a reference implementation of SDF support in a portable SDK
>>>>> harness.
>>>>> From here on, we need to replicate this support in other portable
>>>>> runners and other harnesses. The obvious targets are Flink and Python
>>>>> respectively.
>>>>>
>>>>> Chamikara was going to work on the Python harness. +Thomas Weise
>>>>> <th...@apache.org> Would you be interested in the Flink portable
>>>>> streaming runner side? It is of course blocked by having the rest of that
>>>>> runner working in streaming mode though (the batch mode is practically done
>>>>> - will send you a separate note about the status of that).
>>>>>
>>>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <
>>>>> kirpichov@google.com> wrote:
>>>>>
>>>>>> Luke is right - unbounded sources should go through SDF. I am
>>>>>> currently working on adding such support to Fn API.
>>>>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>>>>>> focuses on a much more general case, but also considers in detail the
>>>>>> specific case of running unbounded sources on Fn API), and the first
>>>>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>>>>
>>>>>> Ways you can help speed up this effort:
>>>>>> - Make necessary changes to Apex runner per se to support regular
>>>>>> SDFs in streaming (without portability). They will likely largely carry
>>>>>> over to portable world. I recall that the Apex runner had some level of
>>>>>> support of SDFs, but didn't pass the ValidatesRunner tests yet.
>>>>>> - (general to Beam, not Apex-related per se) Implement the
>>>>>> translation of Read.from(UnboundedSource) via impulse, which will require
>>>>>> implementing an SDF that reads from a given UnboundedSource (taking the
>>>>>> UnboundedSource as an element). This should be fairly straightforward and
>>>>>> will allow all portable runners to take advantage of existing
>>>>>> UnboundedSource's.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>>>>
>>>>>>> This JIRA represents the work that would be to add support for
>>>>>>> unbounded SDF using portability APIs:
>>>>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>>>>>
>>>>>>>> Any pointers what's involved in adding the SDF support? Is it
>>>>>>>> runner specific? Does the ULR cover it?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> All "sources" in portability will use splittable DoFns for
>>>>>>>>> execution.
>>>>>>>>>
>>>>>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>>>>>> sources to get a minimum viable pipeline working.
>>>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>>>>> source.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>>>>>>
>>>>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>>>>
>>>>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>>>>> allow streaming wordcount execution.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Thomas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>

Re: Unbounded source translation for portable pipelines

Posted by Thomas Weise <th...@apache.org>.
Ankur/Eugene,

When you have a chance, please also update the Flink section of:
https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0

Thanks!

On Thu, Jun 28, 2018 at 10:29 AM Thomas Weise <th...@apache.org> wrote:

> The command to run the job server appears to be: ./gradlew -p
> runners/flink/job-server runShadow
>
> Can you please provide the equivalent of the super basic Python example
> from the prototype:
>
>
> https://github.com/bsidhom/beam/blob/hacking-job-server/sdks/python/flink-example.py
>
> Looks as if the Python side runner changed:
>
> Traceback (most recent call last):
>   File "flink-example.py", line 7, in <module>
>     from apache_beam.runners.portability import universal_local_runner
> ImportError: cannot import name universal_local_runner
>
> Thanks,
> Thomas
>
>
> On Wed, Jun 27, 2018 at 9:34 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Hi!
>>
>> Those instructions are not current and I think should be discarded as
>> they referred to a particular effort that is over - +Ankur Goenka
>> <go...@google.com> is, I believe, working on the remaining finishing
>> touches for running from a clean clone of Beam master and documenting how
>> to do that; could you help Thomas so we can start looking at what the
>> streaming runner is missing?
>>
>> We'll need to document this in a more prominent place. When we get to a
>> state where we can run Python WordCount from master, we'll need to document
>> it somewhere on the main portability page and/or the getting started guide;
>> when we can run something more serious, e.g. Tensorflow pipelines, that
>> will be worth a Beam blog post and worth documenting in the TFX
>> documentation.
>>
>> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <th...@apache.org> wrote:
>>
>>> Hi Eugene,
>>>
>>> The basic streaming translation is already in place from the prototype,
>>> though I have not verified it on the master branch yet.
>>>
>>> Are the user instructions for the portable Flink runner at
>>> https://s.apache.org/beam-portability-team-doc current?
>>>
>>> (I don't have a dependency on SDF since we are going to use custom
>>> native Flink sources/sinks at this time.)
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> Wanted to let you know that I've just merged the PR that adds
>>>> checkpointable SDF support to the portable reference runner (ULR) and the
>>>> Java SDK harness:
>>>>
>>>> https://github.com/apache/beam/pull/5566
>>>>
>>>> So now we have a reference implementation of SDF support in a portable
>>>> runner, and a reference implementation of SDF support in a portable SDK
>>>> harness.
>>>> From here on, we need to replicate this support in other portable
>>>> runners and other harnesses. The obvious targets are Flink and Python
>>>> respectively.
>>>>
>>>> Chamikara was going to work on the Python harness. +Thomas Weise
>>>> <th...@apache.org> Would you be interested in the Flink portable
>>>> streaming runner side? It is of course blocked by having the rest of that
>>>> runner working in streaming mode though (the batch mode is practically done
>>>> - will send you a separate note about the status of that).
>>>>
>>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <ki...@google.com>
>>>> wrote:
>>>>
>>>>> Luke is right - unbounded sources should go through SDF. I am
>>>>> currently working on adding such support to Fn API.
>>>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>>>>> focuses on a much more general case, but also considers in detail the
>>>>> specific case of running unbounded sources on Fn API), and the first
>>>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>>>
>>>>> Ways you can help speed up this effort:
>>>>> - Make necessary changes to Apex runner per se to support regular SDFs
>>>>> in streaming (without portability). They will likely largely carry over to
>>>>> portable world. I recall that the Apex runner had some level of support of
>>>>> SDFs, but didn't pass the ValidatesRunner tests yet.
>>>>> - (general to Beam, not Apex-related per se) Implement the translation
>>>>> of Read.from(UnboundedSource) via impulse, which will require implementing
>>>>> an SDF that reads from a given UnboundedSource (taking the UnboundedSource
>>>>> as an element). This should be fairly straightforward and will allow all
>>>>> portable runners to take advantage of existing UnboundedSource's.
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>>>
>>>>>> This JIRA represents the work that would be to add support for
>>>>>> unbounded SDF using portability APIs:
>>>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>>>>
>>>>>>> Any pointers what's involved in adding the SDF support? Is it runner
>>>>>>> specific? Does the ULR cover it?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> All "sources" in portability will use splittable DoFns for
>>>>>>>> execution.
>>>>>>>>
>>>>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>>>>> sources to get a minimum viable pipeline working.
>>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>>>> source.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>>>>>
>>>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>>>
>>>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>>>> allow streaming wordcount execution.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>>
>>>>>>>

Re: Unbounded source translation for portable pipelines

Posted by Thomas Weise <th...@apache.org>.
The command to run the job server appears to be: ./gradlew -p
runners/flink/job-server runShadow

Can you please provide the equivalent of the super basic Python example
from the prototype:

https://github.com/bsidhom/beam/blob/hacking-job-server/sdks/python/flink-example.py

Looks as if the Python side runner changed:

Traceback (most recent call last):
  File "flink-example.py", line 7, in <module>
    from apache_beam.runners.portability import universal_local_runner
ImportError: cannot import name universal_local_runner

Thanks,
Thomas


On Wed, Jun 27, 2018 at 9:34 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Hi!
>
> Those instructions are not current and I think should be discarded as they
> referred to a particular effort that is over - +Ankur Goenka
> <go...@google.com> is, I believe, working on the remaining finishing
> touches for running from a clean clone of Beam master and documenting how
> to do that; could you help Thomas so we can start looking at what the
> streaming runner is missing?
>
> We'll need to document this in a more prominent place. When we get to a
> state where we can run Python WordCount from master, we'll need to document
> it somewhere on the main portability page and/or the getting started guide;
> when we can run something more serious, e.g. Tensorflow pipelines, that
> will be worth a Beam blog post and worth documenting in the TFX
> documentation.
>
> On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <th...@apache.org> wrote:
>
>> Hi Eugene,
>>
>> The basic streaming translation is already in place from the prototype,
>> though I have not verified it on the master branch yet.
>>
>> Are the user instructions for the portable Flink runner at
>> https://s.apache.org/beam-portability-team-doc current?
>>
>> (I don't have a dependency on SDF since we are going to use custom native
>> Flink sources/sinks at this time.)
>>
>> Thanks,
>> Thomas
>>
>>
>> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Hi!
>>>
>>> Wanted to let you know that I've just merged the PR that adds
>>> checkpointable SDF support to the portable reference runner (ULR) and the
>>> Java SDK harness:
>>>
>>> https://github.com/apache/beam/pull/5566
>>>
>>> So now we have a reference implementation of SDF support in a portable
>>> runner, and a reference implementation of SDF support in a portable SDK
>>> harness.
>>> From here on, we need to replicate this support in other portable
>>> runners and other harnesses. The obvious targets are Flink and Python
>>> respectively.
>>>
>>> Chamikara was going to work on the Python harness. +Thomas Weise
>>> <th...@apache.org> Would you be interested in the Flink portable
>>> streaming runner side? It is of course blocked by having the rest of that
>>> runner working in streaming mode though (the batch mode is practically done
>>> - will send you a separate note about the status of that).
>>>
>>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <ki...@google.com>
>>> wrote:
>>>
>>>> Luke is right - unbounded sources should go through SDF. I am currently
>>>> working on adding such support to Fn API.
>>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>>>> focuses on a much more general case, but also considers in detail the
>>>> specific case of running unbounded sources on Fn API), and the first
>>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>>
>>>> Ways you can help speed up this effort:
>>>> - Make necessary changes to Apex runner per se to support regular SDFs
>>>> in streaming (without portability). They will likely largely carry over to
>>>> portable world. I recall that the Apex runner had some level of support of
>>>> SDFs, but didn't pass the ValidatesRunner tests yet.
>>>> - (general to Beam, not Apex-related per se) Implement the translation
>>>> of Read.from(UnboundedSource) via impulse, which will require implementing
>>>> an SDF that reads from a given UnboundedSource (taking the UnboundedSource
>>>> as an element). This should be fairly straightforward and will allow all
>>>> portable runners to take advantage of existing UnboundedSource's.
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>>
>>>>> This JIRA represents the work that would be to add support for
>>>>> unbounded SDF using portability APIs:
>>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>>>
>>>>>> Any pointers what's involved in adding the SDF support? Is it runner
>>>>>> specific? Does the ULR cover it?
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> All "sources" in portability will use splittable DoFns for execution.
>>>>>>>
>>>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>>>> sources to get a minimum viable pipeline working.
>>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>>> source.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>>>>
>>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>>
>>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>>> allow streaming wordcount execution.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Thomas
>>>>>>>>
>>>>>>>>
>>>>>>

Re: Unbounded source translation for portable pipelines

Posted by Eugene Kirpichov <ki...@google.com>.
Hi!

Those instructions are not current and I think should be discarded as they
referred to a particular effort that is over - +Ankur Goenka
<go...@google.com> is, I believe, working on the remaining finishing
touches for running from a clean clone of Beam master and documenting how
to do that; could you help Thomas so we can start looking at what the
streaming runner is missing?

We'll need to document this in a more prominent place. When we get to a
state where we can run Python WordCount from master, we'll need to document
it somewhere on the main portability page and/or the getting started guide;
when we can run something more serious, e.g. Tensorflow pipelines, that
will be worth a Beam blog post and worth documenting in the TFX
documentation.

On Wed, Jun 27, 2018 at 5:35 AM Thomas Weise <th...@apache.org> wrote:

> Hi Eugene,
>
> The basic streaming translation is already in place from the prototype,
> though I have not verified it on the master branch yet.
>
> Are the user instructions for the portable Flink runner at
> https://s.apache.org/beam-portability-team-doc current?
>
> (I don't have a dependency on SDF since we are going to use custom native
> Flink sources/sinks at this time.)
>
> Thanks,
> Thomas
>
>
> On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Hi!
>>
>> Wanted to let you know that I've just merged the PR that adds
>> checkpointable SDF support to the portable reference runner (ULR) and the
>> Java SDK harness:
>>
>> https://github.com/apache/beam/pull/5566
>>
>> So now we have a reference implementation of SDF support in a portable
>> runner, and a reference implementation of SDF support in a portable SDK
>> harness.
>> From here on, we need to replicate this support in other portable runners
>> and other harnesses. The obvious targets are Flink and Python respectively.
>>
>> Chamikara was going to work on the Python harness. +Thomas Weise
>> <th...@apache.org> Would you be interested in the Flink portable streaming
>> runner side? It is of course blocked by having the rest of that runner
>> working in streaming mode though (the batch mode is practically done - will
>> send you a separate note about the status of that).
>>
>> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Luke is right - unbounded sources should go through SDF. I am currently
>>> working on adding such support to Fn API.
>>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>>> focuses on a much more general case, but also considers in detail the
>>> specific case of running unbounded sources on Fn API), and the first
>>> related PR is https://github.com/apache/beam/pull/4743 .
>>>
>>> Ways you can help speed up this effort:
>>> - Make necessary changes to Apex runner per se to support regular SDFs
>>> in streaming (without portability). They will likely largely carry over to
>>> portable world. I recall that the Apex runner had some level of support of
>>> SDFs, but didn't pass the ValidatesRunner tests yet.
>>> - (general to Beam, not Apex-related per se) Implement the translation
>>> of Read.from(UnboundedSource) via impulse, which will require implementing
>>> an SDF that reads from a given UnboundedSource (taking the UnboundedSource
>>> as an element). This should be fairly straightforward and will allow all
>>> portable runners to take advantage of existing UnboundedSource's.
>>>
>>>
>>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>>
>>>> This JIRA represents the work that would be to add support for
>>>> unbounded SDF using portability APIs:
>>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>>> So for streaming, we will need the Impulse translation for bounded
>>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>>
>>>>> Any pointers what's involved in adding the SDF support? Is it runner
>>>>> specific? Does the ULR cover it?
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com>
>>>>> wrote:
>>>>>
>>>>>> All "sources" in portability will use splittable DoFns for execution.
>>>>>>
>>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>>> sources to get a minimum viable pipeline working.
>>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>>> source.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>>>
>>>>>>> What is the intended path to support unbounded sources?
>>>>>>>
>>>>>>> The goal here is to get a minimum translation working that will
>>>>>>> allow streaming wordcount execution.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>

Re: Unbounded source translation for portable pipelines

Posted by Thomas Weise <th...@apache.org>.
Hi Eugene,

The basic streaming translation is already in place from the prototype,
though I have not verified it on the master branch yet.

Are the user instructions for the portable Flink runner at
https://s.apache.org/beam-portability-team-doc current?

(I don't have a dependency on SDF since we are going to use custom native
Flink sources/sinks at this time.)

Thanks,
Thomas


On Tue, Jun 26, 2018 at 2:13 AM Eugene Kirpichov <ki...@google.com>
wrote:

> Hi!
>
> Wanted to let you know that I've just merged the PR that adds
> checkpointable SDF support to the portable reference runner (ULR) and the
> Java SDK harness:
>
> https://github.com/apache/beam/pull/5566
>
> So now we have a reference implementation of SDF support in a portable
> runner, and a reference implementation of SDF support in a portable SDK
> harness.
> From here on, we need to replicate this support in other portable runners
> and other harnesses. The obvious targets are Flink and Python respectively.
>
> Chamikara was going to work on the Python harness. +Thomas Weise
> <th...@apache.org> Would you be interested in the Flink portable streaming
> runner side? It is of course blocked by having the rest of that runner
> working in streaming mode though (the batch mode is practically done - will
> send you a separate note about the status of that).
>
> On Fri, Mar 23, 2018 at 12:20 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Luke is right - unbounded sources should go through SDF. I am currently
>> working on adding such support to Fn API.
>> The relevant document is s.apache.org/beam-breaking-fusion (note: it
>> focuses on a much more general case, but also considers in detail the
>> specific case of running unbounded sources on Fn API), and the first
>> related PR is https://github.com/apache/beam/pull/4743 .
>>
>> Ways you can help speed up this effort:
>> - Make necessary changes to Apex runner per se to support regular SDFs in
>> streaming (without portability). They will likely largely carry over to
>> portable world. I recall that the Apex runner had some level of support of
>> SDFs, but didn't pass the ValidatesRunner tests yet.
>> - (general to Beam, not Apex-related per se) Implement the translation of
>> Read.from(UnboundedSource) via impulse, which will require implementing an
>> SDF that reads from a given UnboundedSource (taking the UnboundedSource as
>> an element). This should be fairly straightforward and will allow all
>> portable runners to take advantage of existing UnboundedSource's.
>>
>>
>> On Fri, Mar 23, 2018 at 3:08 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> Using impulse is a precursor for both bounded and unbounded SDF.
>>>
>>> This JIRA represents the work that would be to add support for unbounded
>>> SDF using portability APIs:
>>> https://issues.apache.org/jira/browse/BEAM-2939
>>>
>>>
>>> On Fri, Mar 23, 2018 at 11:46 AM Thomas Weise <th...@apache.org> wrote:
>>>
>>>> So for streaming, we will need the Impulse translation for bounded
>>>> input, identical with batch, and then in addition to that support for SDF?
>>>>
>>>> Any pointers what's involved in adding the SDF support? Is it runner
>>>> specific? Does the ULR cover it?
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 11:26 AM, Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> All "sources" in portability will use splittable DoFns for execution.
>>>>>
>>>>> Specifically, runners will need to be able to checkpoint unbounded
>>>>> sources to get a minimum viable pipeline working.
>>>>> For bounded pipelines, a DoFn can read the contents of a bounded
>>>>> source.
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 10:52 AM Thomas Weise <th...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm looking at the portable pipeline translation for streaming. I
>>>>>> understand that for batch pipelines, it is sufficient to translate Impulse.
>>>>>>
>>>>>> What is the intended path to support unbounded sources?
>>>>>>
>>>>>> The goal here is to get a minimum translation working that will allow
>>>>>> streaming wordcount execution.
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>