You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Thiago Chiarato <ch...@gmail.com> on 2020/10/05 17:28:10 UTC

DataflowRunner and PubSub Acknowledge

Hi,
I'm trying to discover how and when Dataflow acknowledges an inflight
message from PubSub.
Could you please help me with where I should start investigating this
behavior of DataflowRunner and PubSub?

Best regards,
Thiago

Re: DataflowRunner and PubSub Acknowledge

Posted by Jeff Klukas <jk...@mozilla.com>.
Hi Thiago,

Note that Dataflow has a custom implementation of PubSub interaction, so
the code you see in PubsubIO in the Beam codebase does not necessarily
reflect Pubsub handling in Dataflow.

Dataflow acks messages as soon as they are first checkpointed, so the first
step in your pipeline that introduces a GroupByKey operation will be the
point at which messages get acked. This means that the checkpointed state
in a streaming Dataflow job reading from PubSub is significant. A job that
is cancelled without draining could lead to messages acked in the
subscription that were never fully processed by your pipeline.

On Mon, Oct 5, 2020 at 1:28 PM Thiago Chiarato <ch...@gmail.com> wrote:

> Hi,
> I'm trying to discover how and when Dataflow acknowledges an inflight
> message from PubSub.
> Could you please help me with where I should start investigating this
> behavior of DataflowRunner and PubSub?
>
> Best regards,
> Thiago
>