You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (Jira)" <ji...@apache.org> on 2020/05/21 16:54:00 UTC

[jira] [Created] (BEAM-10056) Side Input Validation too tight, doesn't allow CoGBK

Robert Burke created BEAM-10056:
-----------------------------------

             Summary: Side Input Validation too tight, doesn't allow CoGBK
                 Key: BEAM-10056
                 URL: https://issues.apache.org/jira/browse/BEAM-10056
             Project: Beam
          Issue Type: Bug
          Components: sdk-go
            Reporter: Robert Burke
            Assignee: Robert Burke


The following doesn't pass validation, though it should as it's a valid signature for ParDo accepting a PCollection<CoGBK<string, *clientHistory, *clientHistory>>

func (fn *writer) StartBundle(ctx context.Context) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter1, iter2 func(**clientHistory) bool)

func (fn *writer) FinishBundle(ctx context.Context)

It returns an error:

Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle.
Full error:
        inserting ParDo in scope root:
        graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle [recovered]
        panic: Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle.
Full error:
        inserting ParDo in scope root:
        graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
side inputs expected in method StartBundle


This is happening in the input unaware validation, which means it needs to be loosened, and validated elsewhere.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527

There are "sibling" cases for the DoFn  signature

func (fn *writer) StartBundle(context.Context, side func(**clientHistory) bool) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
iter, side func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side, func(**clientHistory) bool)

and

func (fn *writer) StartBundle(context.Context, side1, side2 func(**clientHistory) bool) error

func (fn *writer) ProcessElement(
ctx context.Context,
key string,
side1, side2 func(**clientHistory) bool)

func (fn *writer) FinishBundle( context.Context, side1, side2 func(**clientHistory) bool)

Would be for  <CoGBK<string, *clientHistory>> with <*clientHistory> on the side, and
 <string,> with <*clientHistory> and <*clientHistory> on the side respectively.

Which would only be determinable fully with the input, and should provide a clear error when PCollection binding is occuring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)