You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chaim Turkel <ch...@behalf.com> on 2017/09/26 13:56:35 UTC

Pipeline performance

Hi,
  I am transforming multiple tables from mongo to bigquery (about 20)
currently i have one pipeline for each table. Each table is a
collection. Is there a limitation for how many collections i can have?
Would it be better to create multiple pipelines?


chaim

Re: Pipeline performance

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
It is usually better to create a single pipeline since you will have better
load balancing of work across your different tables and I would expect that
the pipeline would finish sooner vs waiting for all the pipelines to finish.
Also, different runners will be able to support different pipeline sizes.
For example, users have submitted pipelines to Google Cloud Dataflow with
upto about 1000 steps.



On Tue, Sep 26, 2017 at 6:56 AM, Chaim Turkel <ch...@behalf.com> wrote:

> Hi,
>   I am transforming multiple tables from mongo to bigquery (about 20)
> currently i have one pipeline for each table. Each table is a
> collection. Is there a limitation for how many collections i can have?
> Would it be better to create multiple pipelines?
>
>
> chaim
>