You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/07 10:42:02 UTC

[GitHub] [beam] mdlnr commented on pull request #23234: Update BQIO to a single scheduled executor service reduce threads

mdlnr commented on PR #23234:
URL: https://github.com/apache/beam/pull/23234#issuecomment-1305416050

   > This PR causes the BigQuery client to hang on shutdown.
   
   @reuvenlax We observed the behaviour of hanging threads in our Dataflow jobs during `StreamWriter.close` with Beam Version 1.41.0. So maybe the PR does not cause this problem but might make it more visible.
   
   In our pipelines we see continously increasing number of threads with stack traces like these:
   
   ```
   "pool-3-thread-61" #1767 prio=5 os_prio=0 cpu=0.22ms elapsed=337556.83s tid=0x00007f2d982459d0 nid=0x6f7 in Object.wait()  [0x00007f2d886f1000]
      java.lang.Thread.State: WAITING (on object monitor)
   	at java.lang.Object.wait(java.base@17.0.2/Native Method)
   	- waiting on <no object reference available>
   	at java.lang.Thread.join(java.base@17.0.2/Thread.java:1304)
   	- locked <merged>(a java.lang.Thread)
   	at java.lang.Thread.join(java.base@17.0.2/Thread.java:1372)
   	at com.google.cloud.bigquery.storage.v1.StreamWriter.close(StreamWriter.java:369)
   	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.close(BigQueryServicesImpl.java:1339)
   	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$$Lambda$692/0x00000008015dd4c8.run(Unknown Source)
   	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords.lambda$runAsyncIgnoreFailure$1(StorageApiWritesShardedRecords.java:138)
   	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$$Lambda$685/0x00000008015d6b58.run(Unknown Source)
   	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@17.0.2/Executors.java:539)
   	at java.util.concurrent.FutureTask.run(java.base@17.0.2/FutureTask.java:264)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.2/ThreadPoolExecutor.java:1136)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.2/ThreadPoolExecutor.java:635)
   	at java.lang.Thread.run(java.base@17.0.2/Thread.java:833)
   ```
   
   We think this is related to using a pretty old version of the dependency `com.google.cloud:google-cloud-bigquerystorage:2.12.2`. The library has since made many changes in the `StreamWriter` class which could fix this issue. Is there anything that prevents updating a newer released version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org