You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/22 19:04:04 UTC

[GitHub] [beam] boyuanzz commented on pull request #13592: [BEAM-11403] Cache UnboundedReader per CheckpointMark in SDF Wrapper DoFn.

boyuanzz commented on pull request #13592:
URL: https://github.com/apache/beam/pull/13592#issuecomment-749721610


   > One more concern - the current implementation relies on `CheckpointMark#hashCode` and `CheckpointMark#equals`. It is likely that these will not have these two correctly implemented. We should stick to `Coder#structuralValue` for that.
   > The same holds true for UnboundedSource#hashCode and equals, we should probably not use the source in the cache key, because it should be impossible for single DoFn instance to read from multiple readers.
   
   I was thinking about using `Coder#structuralValue` but I'm concerning about the additional overhead from encoding. The cacheKey is created very frequently(at least twice per element) and it's not cheap for coder to encode a value. 
   
   As you mentioned, a DoFn instance could process multiple sources especially the source allows initial split(and we cannot assume that CheckpointMark contains the source info, although in most case it does), that's why I decided to use `UnboundedSourceRestriction` to locate a reader. DirectRunner is using timers(InMemoryTimerInternals) and states(in memory as well) to reschedule checkpoints. It should be a reference instead of a deep copy?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org