You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Sam Whittle (Jira)" <ji...@apache.org> on 2020/11/10 09:28:00 UTC

[jira] [Assigned] (BEAM-11216) StreamingDataflowWorker ReaderCache usage can be incorrect in presence of retries

     [ https://issues.apache.org/jira/browse/BEAM-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Whittle reassigned BEAM-11216:
----------------------------------

    Assignee: Sam Whittle

> StreamingDataflowWorker ReaderCache usage can be incorrect in presence of retries
> ---------------------------------------------------------------------------------
>
>                 Key: BEAM-11216
>                 URL: https://issues.apache.org/jira/browse/BEAM-11216
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>
> This is similar to BEAM-7547.  The issue and identified there was related to the state cache.  However for UnboundedSource there is a separate ReaderCache.  In particular the following sequence could lead to using a Reader at the wrong position.
> 1. Work arrives indicating to use reader at state C1
> 2. Work processes and advances reader to C2, commit is prepared with elements output from C1 to C2 and reader checkpoint for C2
> 3. Commit of original processing fails (perhaps routed to previous backend during autoscaling)
> 4. Work is retried (still indicating to use reader at C1 and still same cache token) however the ReaderCache is used, there is a hit, and the existing reader (positioned at C2) is used.
> 5. Retry of work processes and advances reader to C3, commit is scheduled with elements output from C2 to C3 and reader checkpoint for C3
> 6. Commit succeeds
> At this point there was never a successful commit for the elements between C1 to C2, though the reader is now advanced past them.
> Possible fixes:
> 1. Use increasing work token as in BEAM-7547 to detect retry and not use the ReaderCache entry
> 2. Seek recovered reader from cache to checkpoint or only use if it matches the checkpoint. This would probably involve changing the Checkpoint interface so 1 is likely preferred.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)