You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Pesach Weinstock (JIRA)" <ji...@apache.org> on 2019/03/14 16:13:00 UTC

[jira] [Created] (BEAM-6831) python sdk WriteToBigQuery excessive usage of metered API

Pesach Weinstock created BEAM-6831:
--------------------------------------

             Summary: python sdk WriteToBigQuery excessive usage of metered API
                 Key: BEAM-6831
                 URL: https://issues.apache.org/jira/browse/BEAM-6831
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
    Affects Versions: 2.10.0
            Reporter: Pesach Weinstock
         Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png

Right now, there is a potential issue with the python sdk where {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often than needed:

[https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]

The above request falls under specific bigquery API quotas which are excluded from bigquery streaming inserts. When used in a streaming pipeline, we hit this quota pretty quickly, and cannot proceed to write any further data to bigquery.

Dispositions being used are:
 * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
 * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}

This is currently blocking us from using bigqueryIO in a streaming pipeline to write to bigquery, and required us to formally request an API quota increase from Google to temporarily correct the situation.

Our pipeline uses DataflowRunner. I am unable to attach screenshots to this JIRA, but the following message is received in logs:
{code:java}
"error": {
  "code": 403,
  "message": "Exceeded rate limits: too many api requests per user per method for this user_method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",
  "errors": [
    {
      "message": "Exceeded rate limits: too many api requests per user per method for this user_method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",
      "domain": "usageLimits",
      "reason": "rateLimitExceeded"
    }
  ],
  "status": "PERMISSION_DENIED"
}{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)