You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Joey Tran <jo...@schrodinger.com> on 2023/10/19 21:00:21 UTC

Advanced Composite Transform Documentation

For the python SDK, is there somewhere where we document more "advance"
composite transform operations? e.g. I've been stumbling with questions
like "How do I use a transform that expects a PBegin in a composite
transform", "What's the proper way to return multiple output
pcollections?", "What's the proper way to typehint multiple output
pcollections?"

ChatGPT helped me figure out the first question (use `pcoll.pipeline`), the
second question I guessed and the third question I'm still unsure about.

Tried looking for these answers in the documentation but might just be
missing it.

Best,
Joey

Re: Advanced Composite Transform Documentation

Posted by Robert Bradshaw via user <us...@beam.apache.org>.
On Thu, Oct 19, 2023 at 2:00 PM Joey Tran <jo...@schrodinger.com> wrote:
>
> For the python SDK, is there somewhere where we document more "advance" composite transform operations?

I'm not sure, but
https://beam.apache.org/documentation/programming-guide/ is the
canonical palace information like this should probaby be. Maybe this
users list will serve as a searchable resource at least. (Stack
overflow can be good sometimes as well.)

> e.g. I've been stumbling with questions like "How do I use a transform that expects a PBegin in a composite transform",

As you mentioned, you do "pipeline | Transform," and you can get the
pipeline object from any PCollection you have in hand.

> "What's the proper way to return multiple output pcollections?",

You can return them as a(n ordinary) tuple or a dict (with string
keys). This is best expressed with the typescript implementation
(https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/pvalue.ts#L172
) but works for Python too.

> "What's the proper way to typehint multiple output pcollections?"

Typehinting for multiple outputs is still a work in progress, but I
would just add standard Python typehints to the expand method (which
is where we'd pick them up).

> ChatGPT helped me figure out the first question (use `pcoll.pipeline`), the second question I guessed and the third question I'm still unsure about.
>
> Tried looking for these answers in the documentation but might just be missing it.
>
> Best,
> Joey