You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Sam Whittle (Jira)" <ji...@apache.org> on 2020/11/10 09:28:00 UTC
[jira] [Assigned] (BEAM-11216) StreamingDataflowWorker ReaderCache
usage can be incorrect in presence of retries
[ https://issues.apache.org/jira/browse/BEAM-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Whittle reassigned BEAM-11216:
----------------------------------
Assignee: Sam Whittle
> StreamingDataflowWorker ReaderCache usage can be incorrect in presence of retries
> ---------------------------------------------------------------------------------
>
> Key: BEAM-11216
> URL: https://issues.apache.org/jira/browse/BEAM-11216
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: P2
>
> This is similar to BEAM-7547. The issue and identified there was related to the state cache. However for UnboundedSource there is a separate ReaderCache. In particular the following sequence could lead to using a Reader at the wrong position.
> 1. Work arrives indicating to use reader at state C1
> 2. Work processes and advances reader to C2, commit is prepared with elements output from C1 to C2 and reader checkpoint for C2
> 3. Commit of original processing fails (perhaps routed to previous backend during autoscaling)
> 4. Work is retried (still indicating to use reader at C1 and still same cache token) however the ReaderCache is used, there is a hit, and the existing reader (positioned at C2) is used.
> 5. Retry of work processes and advances reader to C3, commit is scheduled with elements output from C2 to C3 and reader checkpoint for C3
> 6. Commit succeeds
> At this point there was never a successful commit for the elements between C1 to C2, though the reader is now advanced past them.
> Possible fixes:
> 1. Use increasing work token as in BEAM-7547 to detect retry and not use the ReaderCache entry
> 2. Seek recovered reader from cache to checkpoint or only use if it matches the checkpoint. This would probably involve changing the Checkpoint interface so 1 is likely preferred.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)