You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/08/10 17:08:20 UTC

[jira] [Commented] (BEAM-6831) python sdk WriteToBigQuery excessive usage of metered API

    [ https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174754#comment-17174754 ] 

Beam JIRA Bot commented on BEAM-6831:
-------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> python sdk WriteToBigQuery excessive usage of metered API
> ---------------------------------------------------------
>
>                 Key: BEAM-6831
>                 URL: https://issues.apache.org/jira/browse/BEAM-6831
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.10.0
>            Reporter: Pesach Weinstock
>            Priority: P2
>              Labels: bigquery, dataflow, gcp, python, stale-P2
>         Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png
>
>
> Right now, there is a potential issue with the python sdk where {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often than needed:
> [https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]
> The above request falls under specific bigquery API quotas which are excluded from bigquery streaming inserts. When used in a streaming pipeline, we hit this quota pretty quickly, and cannot proceed to write any further data to bigquery.
> Dispositions being used are:
>  * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
>  * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}
> This is currently blocking us from using bigqueryIO in a streaming pipeline to write to bigquery, and required us to formally request an API quota increase from Google to temporarily correct the situation.
> Our pipeline uses DataflowRunner. Error seen is below, and in attached screenshot of stackdriver trace.
> {code:java}
>   "errors": [
>     {
>       "message": "Exceeded rate limits: too many api requests per user per method for this user_method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",
>       "domain": "usageLimits",
>       "reason": "rateLimitExceeded"
>     }
>   ],
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)