You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Amit Sela (JIRA)" <ji...@apache.org> on 2017/01/21 21:05:26 UTC

[jira] [Created] (BEAM-1294) Long running UnboundedSource Readers via Broadcasts

Amit Sela created BEAM-1294:
-------------------------------

             Summary: Long running UnboundedSource Readers via Broadcasts
                 Key: BEAM-1294
                 URL: https://issues.apache.org/jira/browse/BEAM-1294
             Project: Beam
          Issue Type: Improvement
          Components: runner-spark
            Reporter: Amit Sela
            Assignee: Amit Sela


When reading from an UnboundedSource, current implementation will cause each split to create a new Reader every micro-batch.

As long as the overhead of creating a reader is relatively low, it's reasonable (though I'd still be happy to get rid of), but in cases where the creation overhead is large it becomes unreasonable forcing large batches.

One way to solve this could be to create a pool of lazy-init readers to serve each executor, maybe via Broadcast variables. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)