You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Jingsong Lee (JIRA)" <ji...@apache.org> on 2017/04/06 16:26:41 UTC

[jira] [Commented] (BEAM-1723) FlinkRunner should deduplicate when an UnboundedSource requires Deduping

    [ https://issues.apache.org/jira/browse/BEAM-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959226#comment-15959226 ] 

Jingsong Lee commented on BEAM-1723:
------------------------------------

I see {{CachedIdDeduplicator}} in direct runner. It use {{LoadingCache}} to dedup. The expireAfterAccess is 10 minutes and the maximumSize is 100_000. Do these two values need to be parameterized?

Do these caches need be snapshotted in flink runner?  (Fault tolerance)

> FlinkRunner should deduplicate when an UnboundedSource requires Deduping
> ------------------------------------------------------------------------
>
>                 Key: BEAM-1723
>                 URL: https://issues.apache.org/jira/browse/BEAM-1723
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Thomas Groh
>
> UnboundedSource implementations can require deduping, and the FlinkRunner currently logs a warning that this is not supported.
> https://github.com/apache/beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L139



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)