You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Aviem Zur <av...@gmail.com> on 2016/11/29 08:42:26 UTC

UnboundedSource backlog num events

Hi,

Today UnboundedSource exposes split backlog in bytes via
getSplitBacklogBytes()

I think there is much value in exposing backlog in number of events as
well, since this number can be more human comprehensible than bytes.
something like getSplitBacklogEvents() or getSplitBacklogCount().
Thoughts?

Re: UnboundedSource backlog num events

Posted by Dan Halperin <dh...@google.com.INVALID>.
Hi Aviem,

Another good question. There's no strong reason why not have Count in
addition to Bytes.

Practically, in the Dataflow runner we found bytes to be the best signal
here. I won't go deeply into why, but two intuitions:
* Beam is designed to enable runners to minimize the per-element overhead;
that's why we have multi-element bundles in the first place.
* If serialization is one of the main overheads in your system, then bytes
is often what you are going to care about.

That said, other runners may (and surely do) work very differently than
Dataflow's. It's totally reasonable to add these signals to the APIs if
there is a runner that would benefit from using them!

Dan

On Tue, Nov 29, 2016 at 12:42 AM, Aviem Zur <av...@gmail.com> wrote:

> Hi,
>
> Today UnboundedSource exposes split backlog in bytes via
> getSplitBacklogBytes()
>
> I think there is much value in exposing backlog in number of events as
> well, since this number can be more human comprehensible than bytes.
> something like getSplitBacklogEvents() or getSplitBacklogCount().
> Thoughts?
>