You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Maximilian Michels <mx...@apache.org> on 2018/10/05 14:05:00 UTC

Beam Summit community feedback

Hi,

What do you think about collecting some of the feedback from the 
community at Beam Summit last week? Here's what I've come across:


* The Kubernetes / Docker Story

Multiple users reported that they would like a Beam-Kubernetes story. 
What is the best way to deploy Beam with Kubernetes? Will there be 
built-in support?

Especially with regards to the portability, there are some unsolved 
problems, e.g. how to start Beam containerized and bootstrap the SDK 
Harness container from within a container? For local testing with the 
JobServer we support that via mounting the Docker socket, but this will 
be too fragile in production scenarios. Now that we have process-based 
execution, we could just use that inside the main container.

Deployment is a very important topic for users and we should try to 
reduce complexity as much as possible.

* External SDKs / Scio

Users have asked why Scio is not part of the main repository. Generally, 
I don't think that has to be the case, same for the Runners which are 
not part of the main repo. However, it does raise the question, what 
will be the future model for maintaining SDKs/IOs/Runners? How do we 
ensure easy development and a consistent quality of internal/external 
components?

* Documenting Timers & State

These two have excellent blog posts but are not part of the official 
documentation. Since they are part of the model, it would be good to 
eventually update the docs.

* Better Debuggability of pipelines

Even a simple WordCount in Beam leads to a quite complex Flink execution 
graph (due to the the involved I/O logic). How can we make pipelines 
easier to understand? Will we provide a way to visualize the 
architecture of high-level Beam pipelines? If so, do we provide a way to 
gain insight into how it is mapped to the Runner execution model? Users 
would like to have more insight.

* Current Roadmap

This was asked in the context of portability. By the end of the year we 
should have at least the FlinkRunner in a ready state, with the rest 
following up. There are a lot of others threads in Beam. The newsletter 
is a great way to keep up with the project development.


Looking forward to any other points you might have.

Best,
Max

Re: Beam Summit community feedback

Posted by Matthias Baetens <ba...@gmail.com>.

Hey Max,

Great stuff, thank you for sharing this.
In case anyone has feedback on the summit as a whole, please feel free to
fill out the survey <https://goo.gl/forms/Oka3kicBrFyUXEvp1> as well.

Thank you!
Best regards,
Matthias

On Tue, 9 Oct 2018 at 10:48 Maximilian Michels <mx...@apache.org> wrote:

> Thanks for the pointer to the thread. I didn't know there already had
> been a discussion. It is possible to look at Kubernetes support solely
> from a Runner perspective, still we have to provide the basic knobs in
> Beam to make deployment easy.
>
> The approach Henning described here and in the thread (Approach 2:
>
> https://lists.apache.org/thread.html/209ddf4d701c8c915e3b411e99773f491a6cd830807d636b470000e8@%3Cdev.beam.apache.org%3E)
>
> where the backend and the SDK harness are started concurrently with
> fixed endpoints would be the way to go. In the Proto we already have the
> "EXTERNAL" environment for that.
>
> On 08.10.18 20:18, Thomas Weise wrote:
> > Related thread:
> >
> >
> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
> >
> > Kubernetes is otherwise more of a runner deployment concern. There are
> > efforts in the Flink community underway to make deployment on Kubernetes
> > easier.
> >
> > Max: thanks for taking notes!
> >
> >
> > On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <herohde@google.com
> > <ma...@google.com>> wrote:
> >
> >     Regarding the Kubernetes/Docker story: the current idea for that
> >     setup is to use a per-job pod for the user/sdk containers + runner
> >     container, so that running (and scaling) a job will go with the
> >     grain of that ecosystem. The Beam code on each worker thus wouldn't
> >     do any container management. This is also how Dataflow essentially
> >     works. The process-based option assumes that the runner environment
> >     is what the SDK needs, which is generally not the case.
> >
> >     Henning
> >
> >     On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <alex@vanboxel.be
> >     <ma...@vanboxel.be>> wrote:
> >
> >         Hey Max, I've build quit some experience with *Kubernetes* over
> >         the years. The problem you describe seems like a custom operator
> >         story. The thing is I don't know enough of the runner and
> >         bootstrapping story. After the summit I'm quite eager to dive
> >         into a beam problem, so if you like to collaborate on that topic
> >         let me know.
> >
> >           _/
> >         _/ Alex Van Boxel
> >
> >
> >         On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels
> >         <mxm@apache.org <ma...@apache.org>> wrote:
> >
> >             Hi,
> >
> >             What do you think about collecting some of the feedback from
> >             the
> >             community at Beam Summit last week? Here's what I've come
> >             across:
> >
> >
> >             * The Kubernetes / Docker Story
> >
> >             Multiple users reported that they would like a
> >             Beam-Kubernetes story.
> >             What is the best way to deploy Beam with Kubernetes? Will
> >             there be
> >             built-in support?
> >
> >             Especially with regards to the portability, there are some
> >             unsolved
> >             problems, e.g. how to start Beam containerized and bootstrap
> >             the SDK
> >             Harness container from within a container? For local testing
> >             with the
> >             JobServer we support that via mounting the Docker socket,
> >             but this will
> >             be too fragile in production scenarios. Now that we have
> >             process-based
> >             execution, we could just use that inside the main container.
> >
> >             Deployment is a very important topic for users and we should
> >             try to
> >             reduce complexity as much as possible.
> >
> >             * External SDKs / Scio
> >
> >             Users have asked why Scio is not part of the main
> >             repository. Generally,
> >             I don't think that has to be the case, same for the Runners
> >             which are
> >             not part of the main repo. However, it does raise the
> >             question, what
> >             will be the future model for maintaining SDKs/IOs/Runners?
> >             How do we
> >             ensure easy development and a consistent quality of
> >             internal/external
> >             components?
> >
> >             * Documenting Timers & State
> >
> >             These two have excellent blog posts but are not part of the
> >             official
> >             documentation. Since they are part of the model, it would be
> >             good to
> >             eventually update the docs.
> >
> >             * Better Debuggability of pipelines
> >
> >             Even a simple WordCount in Beam leads to a quite complex
> >             Flink execution
> >             graph (due to the the involved I/O logic). How can we make
> >             pipelines
> >             easier to understand? Will we provide a way to visualize the
> >             architecture of high-level Beam pipelines? If so, do we
> >             provide a way to
> >             gain insight into how it is mapped to the Runner execution
> >             model? Users
> >             would like to have more insight.
> >
> >             * Current Roadmap
> >
> >             This was asked in the context of portability. By the end of
> >             the year we
> >             should have at least the FlinkRunner in a ready state, with
> >             the rest
> >             following up. There are a lot of others threads in Beam. The
> >             newsletter
> >             is a great way to keep up with the project development.
> >
> >
> >             Looking forward to any other points you might have.
> >
> >             Best,
> >             Max
> >
>
--

Re: Beam Summit community feedback

Posted by Maximilian Michels <mx...@apache.org>.

Thanks for the pointer to the thread. I didn't know there already had 
been a discussion. It is possible to look at Kubernetes support solely 
from a Runner perspective, still we have to provide the basic knobs in 
Beam to make deployment easy.

The approach Henning described here and in the thread (Approach 2: 
https://lists.apache.org/thread.html/209ddf4d701c8c915e3b411e99773f491a6cd830807d636b470000e8@%3Cdev.beam.apache.org%3E) 
where the backend and the SDK harness are started concurrently with 
fixed endpoints would be the way to go. In the Proto we already have the 
"EXTERNAL" environment for that.

On 08.10.18 20:18, Thomas Weise wrote:
> Related thread:
> 
> https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E
> 
> Kubernetes is otherwise more of a runner deployment concern. There are 
> efforts in the Flink community underway to make deployment on Kubernetes 
> easier.
> 
> Max: thanks for taking notes!
> 
> 
> On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <herohde@google.com 
> <ma...@google.com>> wrote:
> 
>     Regarding the Kubernetes/Docker story: the current idea for that
>     setup is to use a per-job pod for the user/sdk containers + runner
>     container, so that running (and scaling) a job will go with the
>     grain of that ecosystem. The Beam code on each worker thus wouldn't
>     do any container management. This is also how Dataflow essentially
>     works. The process-based option assumes that the runner environment
>     is what the SDK needs, which is generally not the case.
> 
>     Henning
> 
>     On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <alex@vanboxel.be
>     <ma...@vanboxel.be>> wrote:
> 
>         Hey Max, I've build quit some experience with *Kubernetes* over
>         the years. The problem you describe seems like a custom operator
>         story. The thing is I don't know enough of the runner and
>         bootstrapping story. After the summit I'm quite eager to dive
>         into a beam problem, so if you like to collaborate on that topic
>         let me know.
> 
>           _/
>         _/ Alex Van Boxel
> 
> 
>         On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels
>         <mxm@apache.org <ma...@apache.org>> wrote:
> 
>             Hi,
> 
>             What do you think about collecting some of the feedback from
>             the
>             community at Beam Summit last week? Here's what I've come
>             across:
> 
> 
>             * The Kubernetes / Docker Story
> 
>             Multiple users reported that they would like a
>             Beam-Kubernetes story.
>             What is the best way to deploy Beam with Kubernetes? Will
>             there be
>             built-in support?
> 
>             Especially with regards to the portability, there are some
>             unsolved
>             problems, e.g. how to start Beam containerized and bootstrap
>             the SDK
>             Harness container from within a container? For local testing
>             with the
>             JobServer we support that via mounting the Docker socket,
>             but this will
>             be too fragile in production scenarios. Now that we have
>             process-based
>             execution, we could just use that inside the main container.
> 
>             Deployment is a very important topic for users and we should
>             try to
>             reduce complexity as much as possible.
> 
>             * External SDKs / Scio
> 
>             Users have asked why Scio is not part of the main
>             repository. Generally,
>             I don't think that has to be the case, same for the Runners
>             which are
>             not part of the main repo. However, it does raise the
>             question, what
>             will be the future model for maintaining SDKs/IOs/Runners?
>             How do we
>             ensure easy development and a consistent quality of
>             internal/external
>             components?
> 
>             * Documenting Timers & State
> 
>             These two have excellent blog posts but are not part of the
>             official
>             documentation. Since they are part of the model, it would be
>             good to
>             eventually update the docs.
> 
>             * Better Debuggability of pipelines
> 
>             Even a simple WordCount in Beam leads to a quite complex
>             Flink execution
>             graph (due to the the involved I/O logic). How can we make
>             pipelines
>             easier to understand? Will we provide a way to visualize the
>             architecture of high-level Beam pipelines? If so, do we
>             provide a way to
>             gain insight into how it is mapped to the Runner execution
>             model? Users
>             would like to have more insight.
> 
>             * Current Roadmap
> 
>             This was asked in the context of portability. By the end of
>             the year we
>             should have at least the FlinkRunner in a ready state, with
>             the rest
>             following up. There are a lot of others threads in Beam. The
>             newsletter
>             is a great way to keep up with the project development.
> 
> 
>             Looking forward to any other points you might have.
> 
>             Best,
>             Max
>

Re: Beam Summit community feedback

Posted by Thomas Weise <th...@apache.org>.

Related thread:

https://lists.apache.org/thread.html/d6b6fde764796de31996db9bb5f9de3e7aaf0ab29b99d0adb52ac508@%3Cdev.beam.apache.org%3E

Kubernetes is otherwise more of a runner deployment concern. There are
efforts in the Flink community underway to make deployment on Kubernetes
easier.

Max: thanks for taking notes!


On Mon, Oct 8, 2018 at 10:43 AM Henning Rohde <he...@google.com> wrote:

> Regarding the Kubernetes/Docker story: the current idea for that setup is
> to use a per-job pod for the user/sdk containers + runner container, so
> that running (and scaling) a job will go with the grain of that ecosystem.
> The Beam code on each worker thus wouldn't do any container management.
> This is also how Dataflow essentially works. The process-based option
> assumes that the runner environment is what the SDK needs, which is
> generally not the case.
>
> Henning
>
> On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <al...@vanboxel.be> wrote:
>
>> Hey Max, I've build quit some experience with *Kubernetes* over the
>> years. The problem you describe seems like a custom operator story. The
>> thing is I don't know enough of the runner and bootstrapping story. After
>> the summit I'm quite eager to dive into a beam problem, so if you like to
>> collaborate on that topic let me know.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <mx...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> What do you think about collecting some of the feedback from the
>>> community at Beam Summit last week? Here's what I've come across:
>>>
>>>
>>> * The Kubernetes / Docker Story
>>>
>>> Multiple users reported that they would like a Beam-Kubernetes story.
>>> What is the best way to deploy Beam with Kubernetes? Will there be
>>> built-in support?
>>>
>>> Especially with regards to the portability, there are some unsolved
>>> problems, e.g. how to start Beam containerized and bootstrap the SDK
>>> Harness container from within a container? For local testing with the
>>> JobServer we support that via mounting the Docker socket, but this will
>>> be too fragile in production scenarios. Now that we have process-based
>>> execution, we could just use that inside the main container.
>>>
>>> Deployment is a very important topic for users and we should try to
>>> reduce complexity as much as possible.
>>>
>>> * External SDKs / Scio
>>>
>>> Users have asked why Scio is not part of the main repository. Generally,
>>> I don't think that has to be the case, same for the Runners which are
>>> not part of the main repo. However, it does raise the question, what
>>> will be the future model for maintaining SDKs/IOs/Runners? How do we
>>> ensure easy development and a consistent quality of internal/external
>>> components?
>>>
>>> * Documenting Timers & State
>>>
>>> These two have excellent blog posts but are not part of the official
>>> documentation. Since they are part of the model, it would be good to
>>> eventually update the docs.
>>>
>>> * Better Debuggability of pipelines
>>>
>>> Even a simple WordCount in Beam leads to a quite complex Flink execution
>>> graph (due to the the involved I/O logic). How can we make pipelines
>>> easier to understand? Will we provide a way to visualize the
>>> architecture of high-level Beam pipelines? If so, do we provide a way to
>>> gain insight into how it is mapped to the Runner execution model? Users
>>> would like to have more insight.
>>>
>>> * Current Roadmap
>>>
>>> This was asked in the context of portability. By the end of the year we
>>> should have at least the FlinkRunner in a ready state, with the rest
>>> following up. There are a lot of others threads in Beam. The newsletter
>>> is a great way to keep up with the project development.
>>>
>>>
>>> Looking forward to any other points you might have.
>>>
>>> Best,
>>> Max
>>>
>>

Re: Beam Summit community feedback

Posted by Henning Rohde <he...@google.com>.

Regarding the Kubernetes/Docker story: the current idea for that setup is
to use a per-job pod for the user/sdk containers + runner container, so
that running (and scaling) a job will go with the grain of that ecosystem.
The Beam code on each worker thus wouldn't do any container management.
This is also how Dataflow essentially works. The process-based option
assumes that the runner environment is what the SDK needs, which is
generally not the case.

Henning

On Sun, Oct 7, 2018 at 1:40 PM Alex Van Boxel <al...@vanboxel.be> wrote:

> Hey Max, I've build quit some experience with *Kubernetes* over the
> years. The problem you describe seems like a custom operator story. The
> thing is I don't know enough of the runner and bootstrapping story. After
> the summit I'm quite eager to dive into a beam problem, so if you like to
> collaborate on that topic let me know.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <mx...@apache.org> wrote:
>
>> Hi,
>>
>> What do you think about collecting some of the feedback from the
>> community at Beam Summit last week? Here's what I've come across:
>>
>>
>> * The Kubernetes / Docker Story
>>
>> Multiple users reported that they would like a Beam-Kubernetes story.
>> What is the best way to deploy Beam with Kubernetes? Will there be
>> built-in support?
>>
>> Especially with regards to the portability, there are some unsolved
>> problems, e.g. how to start Beam containerized and bootstrap the SDK
>> Harness container from within a container? For local testing with the
>> JobServer we support that via mounting the Docker socket, but this will
>> be too fragile in production scenarios. Now that we have process-based
>> execution, we could just use that inside the main container.
>>
>> Deployment is a very important topic for users and we should try to
>> reduce complexity as much as possible.
>>
>> * External SDKs / Scio
>>
>> Users have asked why Scio is not part of the main repository. Generally,
>> I don't think that has to be the case, same for the Runners which are
>> not part of the main repo. However, it does raise the question, what
>> will be the future model for maintaining SDKs/IOs/Runners? How do we
>> ensure easy development and a consistent quality of internal/external
>> components?
>>
>> * Documenting Timers & State
>>
>> These two have excellent blog posts but are not part of the official
>> documentation. Since they are part of the model, it would be good to
>> eventually update the docs.
>>
>> * Better Debuggability of pipelines
>>
>> Even a simple WordCount in Beam leads to a quite complex Flink execution
>> graph (due to the the involved I/O logic). How can we make pipelines
>> easier to understand? Will we provide a way to visualize the
>> architecture of high-level Beam pipelines? If so, do we provide a way to
>> gain insight into how it is mapped to the Runner execution model? Users
>> would like to have more insight.
>>
>> * Current Roadmap
>>
>> This was asked in the context of portability. By the end of the year we
>> should have at least the FlinkRunner in a ready state, with the rest
>> following up. There are a lot of others threads in Beam. The newsletter
>> is a great way to keep up with the project development.
>>
>>
>> Looking forward to any other points you might have.
>>
>> Best,
>> Max
>>
>

Re: Beam Summit community feedback

Posted by Maximilian Michels <ma...@maximilianmichels.com>.

Hi Alex,

Would be great to have someone experienced with Kubernetes.

Not sure if it would require a custom Kubernetes Operator. It would 
probably suffice to have a dedicated Kubernetes mode which starts the 
Beam environment including Runner and dependencies. From there on, we 
wouldn't have to start additional containers.

The current portable approach requires us to spawn containers for the 
SDK Harness at runtime which wouldn't work on k8s, if I'm not mistaken.

-Max

On 07.10.18 22:40, Alex Van Boxel wrote:
> Hey Max, I've build quit some experience with *Kubernetes* over the 
> years. The problem you describe seems like a custom operator story. The 
> thing is I don't know enough of the runner and bootstrapping story. 
> After the summit I'm quite eager to dive into a beam problem, so if you 
> like to collaborate on that topic let me know.
> 
>   _/
> _/ Alex Van Boxel
> 
> 
> On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Hi,
> 
>     What do you think about collecting some of the feedback from the
>     community at Beam Summit last week? Here's what I've come across:
> 
> 
>     * The Kubernetes / Docker Story
> 
>     Multiple users reported that they would like a Beam-Kubernetes story.
>     What is the best way to deploy Beam with Kubernetes? Will there be
>     built-in support?
> 
>     Especially with regards to the portability, there are some unsolved
>     problems, e.g. how to start Beam containerized and bootstrap the SDK
>     Harness container from within a container? For local testing with the
>     JobServer we support that via mounting the Docker socket, but this will
>     be too fragile in production scenarios. Now that we have process-based
>     execution, we could just use that inside the main container.
> 
>     Deployment is a very important topic for users and we should try to
>     reduce complexity as much as possible.
> 
>     * External SDKs / Scio
> 
>     Users have asked why Scio is not part of the main repository.
>     Generally,
>     I don't think that has to be the case, same for the Runners which are
>     not part of the main repo. However, it does raise the question, what
>     will be the future model for maintaining SDKs/IOs/Runners? How do we
>     ensure easy development and a consistent quality of internal/external
>     components?
> 
>     * Documenting Timers & State
> 
>     These two have excellent blog posts but are not part of the official
>     documentation. Since they are part of the model, it would be good to
>     eventually update the docs.
> 
>     * Better Debuggability of pipelines
> 
>     Even a simple WordCount in Beam leads to a quite complex Flink
>     execution
>     graph (due to the the involved I/O logic). How can we make pipelines
>     easier to understand? Will we provide a way to visualize the
>     architecture of high-level Beam pipelines? If so, do we provide a
>     way to
>     gain insight into how it is mapped to the Runner execution model? Users
>     would like to have more insight.
> 
>     * Current Roadmap
> 
>     This was asked in the context of portability. By the end of the year we
>     should have at least the FlinkRunner in a ready state, with the rest
>     following up. There are a lot of others threads in Beam. The newsletter
>     is a great way to keep up with the project development.
> 
> 
>     Looking forward to any other points you might have.
> 
>     Best,
>     Max
>

Re: Beam Summit community feedback

Posted by Alex Van Boxel <al...@vanboxel.be>.

Hey Max, I've build quit some experience with *Kubernetes* over the years.
The problem you describe seems like a custom operator story. The thing is I
don't know enough of the runner and bootstrapping story. After the summit
I'm quite eager to dive into a beam problem, so if you like to collaborate
on that topic let me know.

 _/
_/ Alex Van Boxel


On Fri, Oct 5, 2018 at 4:05 PM Maximilian Michels <mx...@apache.org> wrote:

> Hi,
>
> What do you think about collecting some of the feedback from the
> community at Beam Summit last week? Here's what I've come across:
>
>
> * The Kubernetes / Docker Story
>
> Multiple users reported that they would like a Beam-Kubernetes story.
> What is the best way to deploy Beam with Kubernetes? Will there be
> built-in support?
>
> Especially with regards to the portability, there are some unsolved
> problems, e.g. how to start Beam containerized and bootstrap the SDK
> Harness container from within a container? For local testing with the
> JobServer we support that via mounting the Docker socket, but this will
> be too fragile in production scenarios. Now that we have process-based
> execution, we could just use that inside the main container.
>
> Deployment is a very important topic for users and we should try to
> reduce complexity as much as possible.
>
> * External SDKs / Scio
>
> Users have asked why Scio is not part of the main repository. Generally,
> I don't think that has to be the case, same for the Runners which are
> not part of the main repo. However, it does raise the question, what
> will be the future model for maintaining SDKs/IOs/Runners? How do we
> ensure easy development and a consistent quality of internal/external
> components?
>
> * Documenting Timers & State
>
> These two have excellent blog posts but are not part of the official
> documentation. Since they are part of the model, it would be good to
> eventually update the docs.
>
> * Better Debuggability of pipelines
>
> Even a simple WordCount in Beam leads to a quite complex Flink execution
> graph (due to the the involved I/O logic). How can we make pipelines
> easier to understand? Will we provide a way to visualize the
> architecture of high-level Beam pipelines? If so, do we provide a way to
> gain insight into how it is mapped to the Runner execution model? Users
> would like to have more insight.
>
> * Current Roadmap
>
> This was asked in the context of portability. By the end of the year we
> should have at least the FlinkRunner in a ready state, with the rest
> following up. There are a lot of others threads in Beam. The newsletter
> is a great way to keep up with the project development.
>
>
> Looking forward to any other points you might have.
>
> Best,
> Max
>

Re: Beam Summit community feedback

Posted by Etienne Chauchot <ec...@apache.org>.

Thanks for sharing this Max !
Etienne

Le vendredi 05 octobre 2018 à 16:05 +0200, Maximilian Michels a écrit :
> Hi,
> 
> What do you think about collecting some of the feedback from the 
> community at Beam Summit last week? Here's what I've come across:
> 
> 
> * The Kubernetes / Docker Story
> 
> Multiple users reported that they would like a Beam-Kubernetes story. 
> What is the best way to deploy Beam with Kubernetes? Will there be 
> built-in support?
> 
> Especially with regards to the portability, there are some unsolved 
> problems, e.g. how to start Beam containerized and bootstrap the SDK 
> Harness container from within a container? For local testing with the 
> JobServer we support that via mounting the Docker socket, but this will 
> be too fragile in production scenarios. Now that we have process-based 
> execution, we could just use that inside the main container.
> 
> Deployment is a very important topic for users and we should try to 
> reduce complexity as much as possible.
> 
> * External SDKs / Scio
> 
> Users have asked why Scio is not part of the main repository. Generally, 
> I don't think that has to be the case, same for the Runners which are 
> not part of the main repo. However, it does raise the question, what 
> will be the future model for maintaining SDKs/IOs/Runners? How do we 
> ensure easy development and a consistent quality of internal/external 
> components?
> 
> * Documenting Timers & State
> 
> These two have excellent blog posts but are not part of the official 
> documentation. Since they are part of the model, it would be good to 
> eventually update the docs.
> 
> * Better Debuggability of pipelines
> 
> Even a simple WordCount in Beam leads to a quite complex Flink execution 
> graph (due to the the involved I/O logic). How can we make pipelines 
> easier to understand? Will we provide a way to visualize the 
> architecture of high-level Beam pipelines? If so, do we provide a way to 
> gain insight into how it is mapped to the Runner execution model? Users 
> would like to have more insight.
> 
> * Current Roadmap
> 
> This was asked in the context of portability. By the end of the year we 
> should have at least the FlinkRunner in a ready state, with the rest 
> following up. There are a lot of others threads in Beam. The newsletter 
> is a great way to keep up with the project development.
> 
> 
> Looking forward to any other points you might have.
> 
> Best,
> Max
>

Re: Beam Summit community feedback

Posted by Ankur Goenka <go...@google.com>.

Thanks Max for sharing.

On Fri, Oct 5, 2018 at 7:05 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi,
>
> What do you think about collecting some of the feedback from the
> community at Beam Summit last week? Here's what I've come across:
>
>
> * The Kubernetes / Docker Story
>
> Multiple users reported that they would like a Beam-Kubernetes story.
> What is the best way to deploy Beam with Kubernetes? Will there be
> built-in support?
>
> Especially with regards to the portability, there are some unsolved
> problems, e.g. how to start Beam containerized and bootstrap the SDK
> Harness container from within a container? For local testing with the
> JobServer we support that via mounting the Docker socket, but this will
> be too fragile in production scenarios. Now that we have process-based
> execution, we could just use that inside the main container.
>
> Deployment is a very important topic for users and we should try to
> reduce complexity as much as possible.
>
> * External SDKs / Scio
>
> Users have asked why Scio is not part of the main repository. Generally,
> I don't think that has to be the case, same for the Runners which are
> not part of the main repo. However, it does raise the question, what
> will be the future model for maintaining SDKs/IOs/Runners? How do we
> ensure easy development and a consistent quality of internal/external
> components?
>
> * Documenting Timers & State
>
> These two have excellent blog posts but are not part of the official
> documentation. Since they are part of the model, it would be good to
> eventually update the docs.
>
> * Better Debuggability of pipelines
>
> Even a simple WordCount in Beam leads to a quite complex Flink execution
> graph (due to the the involved I/O logic). How can we make pipelines
> easier to understand? Will we provide a way to visualize the
> architecture of high-level Beam pipelines? If so, do we provide a way to
> gain insight into how it is mapped to the Runner execution model? Users
> would like to have more insight.
>
> * Current Roadmap
>
> This was asked in the context of portability. By the end of the year we
> should have at least the FlinkRunner in a ready state, with the rest
> following up. There are a lot of others threads in Beam. The newsletter
> is a great way to keep up with the project development.
>
>
> Looking forward to any other points you might have.
>
> Best,
> Max
>