You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Aurélien L (Jira)" <ji...@apache.org> on 2020/08/13 13:15:00 UTC

[jira] [Commented] (BEAM-4548) Long execution delay when using DirectRunner to read from BigQuery Table

    [ https://issues.apache.org/jira/browse/BEAM-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177002#comment-17177002 ] 

Aurélien L commented on BEAM-4548:
----------------------------------

Same issue here, can't figure out why

> Long execution delay when using DirectRunner to read from BigQuery Table
> ------------------------------------------------------------------------
>
>                 Key: BEAM-4548
>                 URL: https://issues.apache.org/jira/browse/BEAM-4548
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp, runner-direct
>    Affects Versions: 2.4.0
>            Reporter: Brian Foo
>            Priority: P3
>
> When using DirectRunner to execute a simple select query against a BigQuery table that contains 100 rows, the pipeline stalls for over 3 minutes. The BigQuery UI can run the same query in under 2 seconds.
> A similar issue was reported here: [https://stackoverflow.com/questions/46907735/beam-direct-runner-slow-bigquery-read|https://www.google.com/url?q=https://stackoverflow.com/questions/46907735/beam-direct-runner-slow-bigquery-read&sa=D&source=hangouts&ust=1528912448506000&usg=AFQjCNHp9JWHFJOnJlBJmLODU1cGBIeXtg]
> I ran a thread dump using Visual M seems like the main thread was in a state of backoff: 
> java.lang.Thread.State: TIMED_WAITING (sleeping)
>  at java.lang.Thread.sleep(Native Method)
>  at com.google.api.client.util.Sleeper$1.sleep(Sleeper.java:43)
>  at com.google.api.client.util.BackOffUtils.next(BackOffUtils.java:50)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.nextBackOff(BigQueryServicesImpl.java:870)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.access$500(BigQueryServicesImpl.java:79)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.pollJob(BigQueryServicesImpl.java:273)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$JobServiceImpl.pollJob(BigQueryServicesImpl.java:247)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.executeQuery(BigQueryQuerySource.java:191)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:136)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:103)
>  at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:134)
>  at org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:210)
>  at org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:87)
>  at org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:62)
>  at org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:144)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:201)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)