You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Tyler Akidau (JIRA)" <ji...@apache.org> on 2017/08/31 18:39:00 UTC

[jira] [Commented] (BEAM-1197) Slowly-changing external data as a side input

    [ https://issues.apache.org/jira/browse/BEAM-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149414#comment-16149414 ] 

Tyler Akidau commented on BEAM-1197:
------------------------------------

For the record, there are additional thoughts on temporal joins in [http://s.apache.org/streaming-sql-spec], mostly regarding the need for broadcast streams and windows that can shrink, e.g., validity windows.

> Slowly-changing external data as a side input
> ---------------------------------------------
>
>                 Key: BEAM-1197
>                 URL: https://issues.apache.org/jira/browse/BEAM-1197
>             Project: Beam
>          Issue Type: Wish
>          Components: beam-model
>            Reporter: Eugene Kirpichov
>
> I've seen repeatedly the following pattern: a user wants to join a PCollection against a slowly-changing external dataset: e.g. a file on GCS, or a Bigtable, etc.
> Side inputs come to mind, but current side input mechanisms don't allow for something like periodically reloading the side input.
> The best hacky solution I came up with for one use case is documented here: http://stackoverflow.com/questions/41254028/can-dataflow-sideinput-be-updated-per-window-by-reading-a-gcs-bucket/41271159#41271159 , we need to do better than this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)