You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/10/23 17:25:05 UTC

[jira] [Assigned] (BEAM-10503) Document expectations around UnboundedReader advance interface

     [ https://issues.apache.org/jira/browse/BEAM-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot reassigned BEAM-10503:
------------------------------------

    Assignee:     (was: David Huntsperger)

> Document expectations around UnboundedReader advance interface
> --------------------------------------------------------------
>
>                 Key: BEAM-10503
>                 URL: https://issues.apache.org/jira/browse/BEAM-10503
>             Project: Beam
>          Issue Type: Task
>          Components: sdk-java-core
>            Reporter: Aaron Meihm
>            Priority: P3
>              Labels: Clarified, P2, stale-assigned
>
> We have implemented some custom IO classes based on UnboundedReader/UnboundedSource. These work as expected, but while doing this I noticed a few things that didn't seem to be well documented and I'm not sure if they behave as would be anticipated.
> With the direct runner, when advance returns false repeatedly it appears as though direct runner will apply an increasing backoff to repeated calls to advance until it returns true, at which point the backoff is reset. This seems to be what I'd expect.
> However when the same code is used with Dataflow, advance will be called multiple times a second for a single given UnboundedSource instance with no backoff continuously. With more then one instance/worker this can start to produce additional CPU load.
> I'm a bit unclear what the right way to do this is, for example should you sleep in advance? I assume not, but it would be great if there was documentation around this interface, especially around the differing behavior of the various runners here and what the right way to implement this is to ensure efficient resource usage when no events are available from the underlying source.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)