You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chad Dombrova <ch...@gmail.com> on 2020/12/02 18:30:44 UTC

Proposal: Scheduled tasks

Hi everyone,
Beam's niche is low latency, high throughput workloads, but Beam has
incredible promise as an orchestrator of long running work that gets sent
to a scheduler.  We've created a modified version of Beam that allows the
python SDK worker to outsource tasks to a scheduler, like Kubernetes batch
jobs[1], Argo[2], or Google's own OpenCue[3].

The basic idea is that any element in a stream can be tagged to be executed
outside of the normal SdkWorker as an atomic "task".  A task is one
invocation of a stage, composed of one or more DoFns, against one a slice
of the data stream, composed of one or more tagged elements.   The upshot
is that we're able to slice up the processing of a stream across
potentially *many* workers, with the trade-off being the added overhead of
starting up a worker process for each task.

For more info on how we use our modified version of Beam to make visual
effects for feature films, check out the talk[4] I gave at the Beam Summit.

Here's our design doc:
https://docs.google.com/document/d/1GrAvDWwnR1QAmFX7lnNA7I_mQBC2G1V2jE2CZOc6rlw/edit?usp=sharing

And here's the github branch:
https://github.com/LumaPictures/beam/tree/taskworker_public

Looking forward to your feedback!
-chad


[1] https://kubernetes.io/docs/concepts/workloads/controllers/job/
[2] https://argoproj.github.io/
[3] https://cloud.google.com/opencue
[4] https://www.youtube.com/watch?v=gvbQI3I03a8&ab_channel=ApacheBeam

Re: Proposal: Scheduled tasks

Posted by Chad Dombrova <ch...@gmail.com>.
Thanks!

On Tue, Dec 8, 2020 at 6:54 AM Pablo Estrada <pa...@google.com> wrote:

> Hi Chad!
> I've been meaning to review this, I've just not carved up the time. I'll
> try to get back to you this week with some thoughts!
> Thanks!
> -P.
>
> On Wed, Dec 2, 2020 at 10:31 AM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Hi everyone,
>> Beam's niche is low latency, high throughput workloads, but Beam has
>> incredible promise as an orchestrator of long running work that gets sent
>> to a scheduler.  We've created a modified version of Beam that allows the
>> python SDK worker to outsource tasks to a scheduler, like Kubernetes
>> batch jobs[1], Argo[2], or Google's own OpenCue[3].
>>
>> The basic idea is that any element in a stream can be tagged to be
>> executed outside of the normal SdkWorker as an atomic "task".  A task is
>> one invocation of a stage, composed of one or more DoFns, against one a
>> slice of the data stream, composed of one or more tagged elements.   The
>> upshot is that we're able to slice up the processing of a stream across
>> potentially *many* workers, with the trade-off being the added overhead
>> of starting up a worker process for each task.
>>
>> For more info on how we use our modified version of Beam to make visual
>> effects for feature films, check out the talk[4] I gave at the Beam Summit.
>>
>> Here's our design doc:
>>
>> https://docs.google.com/document/d/1GrAvDWwnR1QAmFX7lnNA7I_mQBC2G1V2jE2CZOc6rlw/edit?usp=sharing
>>
>> And here's the github branch:
>> https://github.com/LumaPictures/beam/tree/taskworker_public
>>
>> Looking forward to your feedback!
>> -chad
>>
>>
>> [1] https://kubernetes.io/docs/concepts/workloads/controllers/job/
>> [2] https://argoproj.github.io/
>> [3] https://cloud.google.com/opencue
>> [4] https://www.youtube.com/watch?v=gvbQI3I03a8&ab_channel=ApacheBeam
>>
>>

Re: Proposal: Scheduled tasks

Posted by Pablo Estrada <pa...@google.com>.
Hi Chad!
I've been meaning to review this, I've just not carved up the time. I'll
try to get back to you this week with some thoughts!
Thanks!
-P.

On Wed, Dec 2, 2020 at 10:31 AM Chad Dombrova <ch...@gmail.com> wrote:

> Hi everyone,
> Beam's niche is low latency, high throughput workloads, but Beam has
> incredible promise as an orchestrator of long running work that gets sent
> to a scheduler.  We've created a modified version of Beam that allows the
> python SDK worker to outsource tasks to a scheduler, like Kubernetes
> batch jobs[1], Argo[2], or Google's own OpenCue[3].
>
> The basic idea is that any element in a stream can be tagged to be
> executed outside of the normal SdkWorker as an atomic "task".  A task is
> one invocation of a stage, composed of one or more DoFns, against one a
> slice of the data stream, composed of one or more tagged elements.   The
> upshot is that we're able to slice up the processing of a stream across
> potentially *many* workers, with the trade-off being the added overhead
> of starting up a worker process for each task.
>
> For more info on how we use our modified version of Beam to make visual
> effects for feature films, check out the talk[4] I gave at the Beam Summit.
>
> Here's our design doc:
>
> https://docs.google.com/document/d/1GrAvDWwnR1QAmFX7lnNA7I_mQBC2G1V2jE2CZOc6rlw/edit?usp=sharing
>
> And here's the github branch:
> https://github.com/LumaPictures/beam/tree/taskworker_public
>
> Looking forward to your feedback!
> -chad
>
>
> [1] https://kubernetes.io/docs/concepts/workloads/controllers/job/
> [2] https://argoproj.github.io/
> [3] https://cloud.google.com/opencue
> [4] https://www.youtube.com/watch?v=gvbQI3I03a8&ab_channel=ApacheBeam
>
>