You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (Jira)" <ji...@apache.org> on 2019/12/16 20:40:00 UTC

[jira] [Commented] (BEAM-7726) [Go SDK] State Backed Iterables

    [ https://issues.apache.org/jira/browse/BEAM-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997623#comment-16997623 ] 

Robert Burke commented on BEAM-7726:
------------------------------------

At this point State Backed iterables, works, and provides correct results, but I'd like to augment it with one bit of performance tuning: Page Look Ahead.

In particular, the protocol as is means the SDK is spending time waiting for the runner to accomplish IO when it reaches the end of the page, and even having a single page lookahead will improve pipelining.

There's a separate issue in the Go SDK around bundles subject to state backed iterables end up blocking the data channel, if the runner sends the next element to the data channel prior to the bundle being ready for it. The bundle doesn't read from the data channel when it's processing from the state stream, which eventually triggers the SDK's pushback, blocking other bundles on the same worker.  

Runners can solve this by terminating a bundle immediately after sending a Large Iterable element. It's not sufficient to "wait" until the iterable is done, since there's no signal the SDK can provide to say it's ready for more from the Data channel, it's already reading values.

Alternatively, I need to find out if the Go SDK is doing the data channel "correctly" WRT multiplexing bundles. If runners can make use of multiple streams from the SDK (one per bundle), then ordinary GRPC multiplexing should prevent the block. However, offhand, it's not clear that could work, since the SDK side connection doesn't have a "say" in which bundle is being used at that point, as that's not how  BiDi GRPC streams workl  there's no initial stream creation request to identify a stream as owned by a given bundle. In this case, it's up to the runners. I guess.

> [Go SDK] State Backed Iterables
> -------------------------------
>
>                 Key: BEAM-7726
>                 URL: https://issues.apache.org/jira/browse/BEAM-7726
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>    Affects Versions: Not applicable
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Major
>             Fix For: Not applicable
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)