You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/07 18:49:16 UTC

[GitHub] [beam] reuvenlax commented on pull request #23234: Update BQIO to a single scheduled executor service reduce threads

reuvenlax commented on PR #23234:
URL: https://github.com/apache/beam/pull/23234#issuecomment-1306040743

   Shutdown hangs were definitely being caused by that PR. Note that the hang
   seen in that PR was shutting down the RPC client, not shutting down the
   StreamWriter. This appears to be a different issue.
   
   FYI the current head Beam is depending on 2.24.2.
   
   
   
   
   
   On Mon, Nov 7, 2022 at 2:42 AM Thorsten Madlener ***@***.***>
   wrote:
   
   > This PR causes the BigQuery client to hang on shutdown.
   >
   > @reuvenlax <https://github.com/reuvenlax> We observed the behaviour of
   > hanging threads in our Dataflow jobs during StreamWriter.close with Beam
   > Version 1.41.0. So maybe the PR does not cause this problem but might make
   > it more visible.
   >
   > In our pipelines we see continously increasing number of threads with
   > stack traces like these:
   >
   > "pool-3-thread-61" #1767 prio=5 os_prio=0 cpu=0.22ms elapsed=337556.83s tid=0x00007f2d982459d0 nid=0x6f7 in Object.wait()  [0x00007f2d886f1000]
   >    java.lang.Thread.State: WAITING (on object monitor)
   > 	at ***@***.***/Native Method)
   > 	- waiting on <no object reference available>
   > 	at ***@***.***/Thread.java:1304)
   > 	- locked <merged>(a java.lang.Thread)
   > 	at ***@***.***/Thread.java:1372)
   > 	at com.google.cloud.bigquery.storage.v1.StreamWriter.close(StreamWriter.java:369)
   > 	at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.close(BigQueryServicesImpl.java:1339)
   > 	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$$Lambda$692/0x00000008015dd4c8.run(Unknown Source)
   > 	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords.lambda$runAsyncIgnoreFailure$1(StorageApiWritesShardedRecords.java:138)
   > 	at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWritesShardedRecords$$Lambda$685/0x00000008015d6b58.run(Unknown Source)
   > 	at ***@***.***/Executors.java:539)
   > 	at ***@***.***/FutureTask.java:264)
   > 	at ***@***.***/ThreadPoolExecutor.java:1136)
   > 	at ***@***.***/ThreadPoolExecutor.java:635)
   > 	at ***@***.***/Thread.java:833)
   >
   > We think this is related to using a pretty old version of the dependency
   > com.google.cloud:google-cloud-bigquerystorage:2.12.2. The library has
   > since made many changes in the StreamWriter class which could fix this
   > issue. Is there anything that prevents updating a newer released version?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/23234#issuecomment-1305416050>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AFAYJVNYIU64P4TB2UNE7XTWHDMIHANCNFSM6AAAAAAQMVGVQM>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org