You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Joey Tran <jo...@schrodinger.com> on 2023/08/31 22:37:02 UTC

Options for visualizing the pipeline DAG

Hi all,

What're all the current options for visualizing a pipeline? I'm guessing
Dataflow has a visualization. I saw that there are also Apache Beam
notebooks through GCP, and I'm aware of the Beam playground, but is there
an easy way to create and view the visualization locally? For example, I
might have a large codebase that's used to construct and run a pipeline,
and in this case I don't think any of those three solutions would be very
easy to use to visualize my pipeline (though I could be wrong)

Best,
Joey

-- 

Joey Tran | Senior Developer Il | AutoDesigner TL

*he/him*

[image: Schrödinger, Inc.] <https://schrodinger.com/>

Re: Options for visualizing the pipeline DAG

Posted by Robert Bradshaw via user <us...@beam.apache.org>.
(As an aside, I think all of these options would make for a great blog post
if anyone is interested in authoring one of those...)

On Fri, Sep 1, 2023 at 9:26 AM Robert Bradshaw <ro...@google.com> wrote:

> You can also use Python's RenderRunner, e.g.
>
>   python -m apache_beam.examples.wordcount --output out.txt \
>     --runner=apache_beam.runners.render.RenderRunner \
>     --render_output=pipeline.svg
>
> This also has an interactive mode, triggered by passing --port=N (where 0
> can be used to pick an unused port) which vends the graph as a local web
> service. This allows one to expand/collapse composites for easier
> exploration. Any --render_output arguments that are passed will get
> re-rendered as you edit the graph. (It uses graphviz under the hood, so can
> render any of those supported formats.)
>
> For rendering non-Python pipelines, one can start this up as a local
> portable "runner"
>
>   python -m apache_beam.runners.render
>
> and then "submit" this job from your other SDK over the jobs API to view
> it.
>
> [image: pipeline.png]
>
>
>
> On Fri, Sep 1, 2023 at 7:13 AM Joey Tran <jo...@schrodinger.com>
> wrote:
>
>> Perfect, `pipeline_graph` python module in the stack overflow post [1]
>> was exactly what I was looking for. The dependencies I'm working with are a
>> bit heavyweight and likely difficult to install into a notebook, so I was
>> looking for something I could do on my local machine.
>>
>> Thanks!
>> Joey
>>
>> [1] -
>> https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner
>>
>> On Fri, Sep 1, 2023 at 8:40 AM Danny McCormick via user <
>> user@beam.apache.org> wrote:
>>
>>> Hey Joey,
>>>
>>> Dataflow and Beam playground are 2 options as you mentioned, locally
>>> many SDKs have local runner options with a visual component. For example,
>>> in Python you can use the interactive runner with the
>>> apache-beam-jupyterlab-sidepanel extension
>>> <https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#visualize_the_data_through_the_interactive_beam_inspector>
>>> to view pipelines visually locally (this is similar to what the notebooks
>>> you reference are doing). You can also just call some of these pieces
>>> directly
>>> <https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner>
>>> without an extension. Go has a dot runner
>>> <https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0/go/pkg/beam/runners/dot>
>>> that produces a visual representation of a pipeline. Java has a similar dot
>>> renderer <https://mehmandarov.com/apache-beam-pipeline-graph/>.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Thu, Aug 31, 2023 at 6:38 PM Joey Tran <jo...@schrodinger.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> What're all the current options for visualizing a pipeline? I'm
>>>> guessing Dataflow has a visualization. I saw that there are also Apache
>>>> Beam notebooks through GCP, and I'm aware of the Beam playground, but is
>>>> there an easy way to create and view the visualization locally? For
>>>> example, I might have a large codebase that's used to construct and run a
>>>> pipeline, and in this case I don't think any of those three solutions would
>>>> be very easy to use to visualize my pipeline (though I could be wrong)
>>>>
>>>> Best,
>>>> Joey
>>>>
>>>> --
>>>>
>>>> Joey Tran | Senior Developer Il | AutoDesigner TL
>>>>
>>>> *he/him*
>>>>
>>>> [image: Schrödinger, Inc.] <https://schrodinger.com/>
>>>>
>>>

Re: Options for visualizing the pipeline DAG

Posted by Robert Bradshaw via user <us...@beam.apache.org>.
You can also use Python's RenderRunner, e.g.

  python -m apache_beam.examples.wordcount --output out.txt \
    --runner=apache_beam.runners.render.RenderRunner \
    --render_output=pipeline.svg

This also has an interactive mode, triggered by passing --port=N (where 0
can be used to pick an unused port) which vends the graph as a local web
service. This allows one to expand/collapse composites for easier
exploration. Any --render_output arguments that are passed will get
re-rendered as you edit the graph. (It uses graphviz under the hood, so can
render any of those supported formats.)

For rendering non-Python pipelines, one can start this up as a local
portable "runner"

  python -m apache_beam.runners.render

and then "submit" this job from your other SDK over the jobs API to view
it.

[image: pipeline.png]



On Fri, Sep 1, 2023 at 7:13 AM Joey Tran <jo...@schrodinger.com> wrote:

> Perfect, `pipeline_graph` python module in the stack overflow post [1] was
> exactly what I was looking for. The dependencies I'm working with are a bit
> heavyweight and likely difficult to install into a notebook, so I was
> looking for something I could do on my local machine.
>
> Thanks!
> Joey
>
> [1] -
> https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner
>
> On Fri, Sep 1, 2023 at 8:40 AM Danny McCormick via user <
> user@beam.apache.org> wrote:
>
>> Hey Joey,
>>
>> Dataflow and Beam playground are 2 options as you mentioned, locally many
>> SDKs have local runner options with a visual component. For example, in
>> Python you can use the interactive runner with the
>> apache-beam-jupyterlab-sidepanel extension
>> <https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#visualize_the_data_through_the_interactive_beam_inspector>
>> to view pipelines visually locally (this is similar to what the notebooks
>> you reference are doing). You can also just call some of these pieces
>> directly
>> <https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner>
>> without an extension. Go has a dot runner
>> <https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0/go/pkg/beam/runners/dot>
>> that produces a visual representation of a pipeline. Java has a similar dot
>> renderer <https://mehmandarov.com/apache-beam-pipeline-graph/>.
>>
>> Thanks,
>> Danny
>>
>> On Thu, Aug 31, 2023 at 6:38 PM Joey Tran <jo...@schrodinger.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> What're all the current options for visualizing a pipeline? I'm guessing
>>> Dataflow has a visualization. I saw that there are also Apache Beam
>>> notebooks through GCP, and I'm aware of the Beam playground, but is there
>>> an easy way to create and view the visualization locally? For example, I
>>> might have a large codebase that's used to construct and run a pipeline,
>>> and in this case I don't think any of those three solutions would be very
>>> easy to use to visualize my pipeline (though I could be wrong)
>>>
>>> Best,
>>> Joey
>>>
>>> --
>>>
>>> Joey Tran | Senior Developer Il | AutoDesigner TL
>>>
>>> *he/him*
>>>
>>> [image: Schrödinger, Inc.] <https://schrodinger.com/>
>>>
>>

Re: Options for visualizing the pipeline DAG

Posted by Joey Tran <jo...@schrodinger.com>.
Perfect, `pipeline_graph` python module in the stack overflow post [1] was
exactly what I was looking for. The dependencies I'm working with are a bit
heavyweight and likely difficult to install into a notebook, so I was
looking for something I could do on my local machine.

Thanks!
Joey

[1] -
https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner

On Fri, Sep 1, 2023 at 8:40 AM Danny McCormick via user <
user@beam.apache.org> wrote:

> Hey Joey,
>
> Dataflow and Beam playground are 2 options as you mentioned, locally many
> SDKs have local runner options with a visual component. For example, in
> Python you can use the interactive runner with the
> apache-beam-jupyterlab-sidepanel extension
> <https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#visualize_the_data_through_the_interactive_beam_inspector>
> to view pipelines visually locally (this is similar to what the notebooks
> you reference are doing). You can also just call some of these pieces
> directly
> <https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner>
> without an extension. Go has a dot runner
> <https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0/go/pkg/beam/runners/dot>
> that produces a visual representation of a pipeline. Java has a similar dot
> renderer <https://mehmandarov.com/apache-beam-pipeline-graph/>.
>
> Thanks,
> Danny
>
> On Thu, Aug 31, 2023 at 6:38 PM Joey Tran <jo...@schrodinger.com>
> wrote:
>
>> Hi all,
>>
>> What're all the current options for visualizing a pipeline? I'm guessing
>> Dataflow has a visualization. I saw that there are also Apache Beam
>> notebooks through GCP, and I'm aware of the Beam playground, but is there
>> an easy way to create and view the visualization locally? For example, I
>> might have a large codebase that's used to construct and run a pipeline,
>> and in this case I don't think any of those three solutions would be very
>> easy to use to visualize my pipeline (though I could be wrong)
>>
>> Best,
>> Joey
>>
>> --
>>
>> Joey Tran | Senior Developer Il | AutoDesigner TL
>>
>> *he/him*
>>
>> [image: Schrödinger, Inc.] <https://schrodinger.com/>
>>
>

Re: Options for visualizing the pipeline DAG

Posted by Danny McCormick via user <us...@beam.apache.org>.
Hey Joey,

Dataflow and Beam playground are 2 options as you mentioned, locally many
SDKs have local runner options with a visual component. For example, in
Python you can use the interactive runner with the
apache-beam-jupyterlab-sidepanel extension
<https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#visualize_the_data_through_the_interactive_beam_inspector>
to view pipelines visually locally (this is similar to what the notebooks
you reference are doing). You can also just call some of these pieces
directly
<https://stackoverflow.com/questions/72592971/way-to-visualize-beam-pipeline-run-with-directrunner>
without an extension. Go has a dot runner
<https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0/go/pkg/beam/runners/dot>
that produces a visual representation of a pipeline. Java has a similar dot
renderer <https://mehmandarov.com/apache-beam-pipeline-graph/>.

Thanks,
Danny

On Thu, Aug 31, 2023 at 6:38 PM Joey Tran <jo...@schrodinger.com> wrote:

> Hi all,
>
> What're all the current options for visualizing a pipeline? I'm guessing
> Dataflow has a visualization. I saw that there are also Apache Beam
> notebooks through GCP, and I'm aware of the Beam playground, but is there
> an easy way to create and view the visualization locally? For example, I
> might have a large codebase that's used to construct and run a pipeline,
> and in this case I don't think any of those three solutions would be very
> easy to use to visualize my pipeline (though I could be wrong)
>
> Best,
> Joey
>
> --
>
> Joey Tran | Senior Developer Il | AutoDesigner TL
>
> *he/him*
>
> [image: Schrödinger, Inc.] <https://schrodinger.com/>
>