You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Pavlo Pohrrebnyi (Jira)" <ji...@apache.org> on 2019/12/16 10:50:00 UTC

[jira] [Closed] (BEAM-7403) BigQueryIO.Write does not autoscale correctly (idle workers)

     [ https://issues.apache.org/jira/browse/BEAM-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavlo Pohrrebnyi closed BEAM-7403.
----------------------------------
    Fix Version/s: 2.11.0
       Resolution: Won't Fix

Dataflow Runner was fixed by Google

> BigQueryIO.Write does not autoscale correctly (idle workers)
> ------------------------------------------------------------
>
>                 Key: BEAM-7403
>                 URL: https://issues.apache.org/jira/browse/BEAM-7403
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Pavlo Pohrrebnyi
>            Priority: Major
>             Fix For: 2.11.0
>
>
> Apache Beam version:
> 2.10
> JAVA SDK
> Dataflow GCP Staged
> Details:
> We have a streaming dataflow which ingests data into BigQuery (Streaming Inserts).
> We deploy a job with max number of workers = 40 and
> there is a huge backlog already (high watermark).
> When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers)
> and starts ingesting with 12000 messages/sec rate.
> After 2 mins it scales 3 -> 40 to keep up with a backlog.
> After scaling up, the rate never goes higher than it was with 3 nodes (12000 messages/sec).
> We have memory consumption metrics in Stackdriver; from them
> we see that the first 3 workers consume about 5GB of RAM and the rest 37 workers
> consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle? Importantly, they don’t add to Streaming Inserts process for BigQuery.
> Autoscaling in the other streaming pipelines we have works fine.
> It appears that this is related to BigQuery streaming inserts.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)