You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 18:07:13 UTC

[GitHub] [beam] damccorm opened a new issue, #20541: BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between responses

damccorm opened a new issue, #20541:
URL: https://github.com/apache/beam/issues/20541

   When reading a couple of million rows (and above 100 Gigabytes) from BigQuery Storage (DIRECT_READ) with Dataflow and above 8 vCPUs (4x n1-standard-4) the attached exception is thrown about once per vCPU.
   
   The issue seems to be that the value of fraction_consumed in the [StreamStatus](https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1beta1#streamstatus) object returned from the Storage API decreased between responses.
   
   I tested this repeatedly with varying amounts of input data, number of workers and machine types and was able to reproduce the issue repeatedly with different configurations above 4 -8- vCPUs used (2, 4, 16, 32, 128 and, n1-highmem-4, n1-standard-4, n1-standard-8, and n1-standard-16).
   
   So far Jobs with 4 -8- vCPUs ran fine. (Update: Latest job with 4 vCPUs, 2x n1-highmem-4, also threw the exception).
   
   `Error message from worker: java.io.IOException: Failed to advance reader of source: name: "projects/REDACTED/locations/eu/streams/REDACTED" org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:620) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:399) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417) org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386) org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(Batch
 DataflowWorker.java:311) org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) java.util.concurrent.FutureTask.run(FutureTask.java:266) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Fraction consumed from the current response (0.7945484519004822) has to be larger than or equal to the fraction consumed from the previous response (0.8467302322387695). org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) org.apache.beam.sd
 k.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:242) org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.advance(BigQueryStorageStreamSource.java:211) org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:617) ... 14 more`
   
   Imported from Jira [BEAM-11061](https://issues.apache.org/jira/browse/BEAM-11061). Original Jira may contain additional context.
   Reported by: j.grabber.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #20541: BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between responses

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #20541:
URL: https://github.com/apache/beam/issues/20541#issuecomment-1334585826

   @reuvenlax @johnjcasey do you know if this is resolved now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #20541: BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between responses

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #20541:
URL: https://github.com/apache/beam/issues/20541#issuecomment-1337607617

   Agreed, I'll close this as obsolete


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #20541: BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between responses

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #20541:
URL: https://github.com/apache/beam/issues/20541#issuecomment-1334586191

   I'm guessing issues with this functionality are obsolete since enough has changed. This was filed before 2.25.0 it looks like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey closed issue #20541: BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between responses

Posted by GitBox <gi...@apache.org>.
johnjcasey closed issue #20541: BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between responses
URL: https://github.com/apache/beam/issues/20541


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org