You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 14:37:57 UTC

[GitHub] [beam] damccorm opened a new issue, #19970: Long BigQuery dry runs cause avalanche delay

damccorm opened a new issue, #19970:
URL: https://github.com/apache/beam/issues/19970

   Reproduction Steps:
   
   1. Compose a BigQuery SELECT query that will take over 80 seconds for a dry run.
   2. Run the query with Beam SDK's BigQueryIO.
   3. Observe the 10**** minute delay before the actual query job is created.
   
   When running readTableRows(), BigQueryIO attempts to estimate the query size by performing a dry run, even if withoutValidation() is set. If the request takes over 80 seconds (RetryHttpRequestInitializer.HANGING_GET_TIMEOUT_SEC), RetryHttpRequestInitializer will time out and retry, up to 9 times (BigQueryServicesImpl.MAX_RPC_RETRIES). Hence, once a dry run duration crosses the 80 second tipping point, it causes an inevitable avalanche of a 720-second delay. Considering the fact that size estimation is not a requirement in running the query [1], BigQueryIO should provide a way to circumvent the redundant delay, especially in consideration of time-critical enterprise workloads.
   
   There can be several ways to address this:
   - increasing the timeout threshold (which will still create a tipping point);
   - preventing the dry run requests from retrying; or
   - adding an option to skip the size estimation within serializeToCloudSource().
   
   [1] https://github.com/apache/beam/blob/2ec3b0495c191597c9a88830d25a2c360b3277e0/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/internal/CustomSources.java#L75
   
   Imported from Jira [BEAM-8906](https://issues.apache.org/jira/browse/BEAM-8906). Original Jira may contain additional context.
   Reported by: juneoh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org