You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Amit Sela (JIRA)" <ji...@apache.org> on 2017/01/21 21:05:26 UTC
[jira] [Created] (BEAM-1294) Long running UnboundedSource Readers
via Broadcasts
Amit Sela created BEAM-1294:
-------------------------------
Summary: Long running UnboundedSource Readers via Broadcasts
Key: BEAM-1294
URL: https://issues.apache.org/jira/browse/BEAM-1294
Project: Beam
Issue Type: Improvement
Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela
When reading from an UnboundedSource, current implementation will cause each split to create a new Reader every micro-batch.
As long as the overhead of creating a reader is relatively low, it's reasonable (though I'd still be happy to get rid of), but in cases where the creation overhead is large it becomes unreasonable forcing large batches.
One way to solve this could be to create a pool of lazy-init readers to serve each executor, maybe via Broadcast variables.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)