You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ben Chambers <bc...@apache.org> on 2016/03/15 01:25:44 UTC

Design document for Static Display Data

Hi!

The following document describes work that we're planning on doing to allow
every steps in a pipeline to include more information about what is
actually going on. The goal is to allow UIs and diagnostic tools to display
details about what is happening inside each step by including details what
is otherwise just serialized blobs of code.

Everyone should be able to comment on the following link:
https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing

We will be creating a Jira issue to track the implementation of the
associated SDK changes.

Thanks,
Ben

Re: Design document for Static Display Data

Posted by Ben Chambers <bc...@google.com.INVALID>.
Hi JB.

I attempted to clarify in the proposal as well, but this is focused on
static information -- details that are known during pipeline construction
but would also be useful for display.

For example, if you used the Top.perKey(n) transform all the execution of
the pipeline needs to know is that it is running some serialized CombineFn,
and the code does the rest. But it would be really useful if we could
display that the code being executed is Top, and that it was configured
with "n = 10".
The SDK already features aggregators (~ counters) for monitoring tasks.
While there are potential improvements to the aggregator API and
functionality, they are outside the scope of this proposal.

Let me know if that doesn't clarify things. And thanks for taking a look!
Ben

On Mon, Mar 14, 2016, 5:59 PM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:

> Hi Ben,
>
> thanks for the update.
>
> Correct me if I'm wrong: you are proposing kind of monitoring of the
> pipelines, and be able to trace what's going on during the execution of
> the pipeline.
> It's a very important feature, especially for stream (and data
> integration).
>
> The document greatly describes the data display. Do you have any plan to
> implement kind of "checkpoint"/alerting depending of some predicates on
> the data (it's something that I had in mind for the Beam data
> integration DSL) ? It's maybe the TRIGGER type ?
>
> Thanks again, great document and idea.
>
> Regards
> JB
>
> On 03/15/2016 01:25 AM, Ben Chambers wrote:
> > Hi!
> >
> > The following document describes work that we're planning on doing to
> allow
> > every steps in a pipeline to include more information about what is
> > actually going on. The goal is to allow UIs and diagnostic tools to
> display
> > details about what is happening inside each step by including details
> what
> > is otherwise just serialized blobs of code.
> >
> > Everyone should be able to comment on the following link:
> >
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing
> >
> > We will be creating a Jira issue to track the implementation of the
> > associated SDK changes.
> >
> > Thanks,
> > Ben
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Design document for Static Display Data

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Ben,

thanks for the update.

Correct me if I'm wrong: you are proposing kind of monitoring of the 
pipelines, and be able to trace what's going on during the execution of 
the pipeline.
It's a very important feature, especially for stream (and data integration).

The document greatly describes the data display. Do you have any plan to 
implement kind of "checkpoint"/alerting depending of some predicates on 
the data (it's something that I had in mind for the Beam data 
integration DSL) ? It's maybe the TRIGGER type ?

Thanks again, great document and idea.

Regards
JB

On 03/15/2016 01:25 AM, Ben Chambers wrote:
> Hi!
>
> The following document describes work that we're planning on doing to allow
> every steps in a pipeline to include more information about what is
> actually going on. The goal is to allow UIs and diagnostic tools to display
> details about what is happening inside each step by including details what
> is otherwise just serialized blobs of code.
>
> Everyone should be able to comment on the following link:
> https://docs.google.com/document/d/11enEB9JwVp6vO0uOYYTMYTGkr3TdNfELwWqoiUg5ZxM/edit?usp=sharing
>
> We will be creating a Jira issue to track the implementation of the
> associated SDK changes.
>
> Thanks,
> Ben
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com