You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/03/10 18:54:00 UTC

[jira] [Updated] (BEAM-14067) Beam Core IO: WriteFiles stuck

     [ https://issues.apache.org/jira/browse/BEAM-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kenneth Knowles updated BEAM-14067:
-----------------------------------
    Component/s: io-java-gcp
                     (was: extensions-java-gcp)
                     (was: sdk-java-core)

> Beam Core IO: WriteFiles stuck
> ------------------------------
>
>                 Key: BEAM-14067
>                 URL: https://issues.apache.org/jira/browse/BEAM-14067
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-files, io-java-gcp
>            Reporter: John Gerassimou
>            Priority: P2
>
> In our pipeline, workers are sometimes stuck at step {_}WriteShardsIntoTempFiles{_}. We believe this happens when GCS fails to respond for more than 10 minutes and Google's VPC cuts the connection. The Java code will just hang forever because it's waiting on FIN or RST from the peer. 
> Environment: Java 8, Beam 2.34/2.35, Dataflow runner.
> See below for relevant information from logs, threads, netstat. 
> +Log snippet:+
> 1)
> {code:java}
> Operation ongoing in step Write Events To File/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles for at least 09h15m00s without outputting or completing in state process
>   at org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
>   at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:392)
>   at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
>   at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:474)
>   at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:461)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
>   at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
>   at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
>   at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
>   at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
>   at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
>   at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
>   at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
>   at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
>   at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:148)
>   at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
>   at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
>   at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
>   at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
>   at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
>   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1955)
>   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1857)
>   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getWriteGeneration(GoogleCloudStorageImpl.java:1926)
>   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.create(GoogleCloudStorageImpl.java:374)
>   at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:476)
>   at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:446)
>   at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:127)
>   at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:60)
>   at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:239)
>   at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:226)
>   at org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:935)
>   at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:768)
>   at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown Source){code}
> 2)
>  
> {code:java}
> Operation ongoing in step Write Events To File/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles for at least 04h55m00s without outputting or completing in state process
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(BaseAbstractGoogleAsyncWriteChannel.java:247)
>   at com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.close(BaseAbstractGoogleAsyncWriteChannel.java:168)
>   at org.apache.beam.sdk.io.FileBasedSink$Writer.close(FileBasedSink.java:1029)
>   at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:932)
>   at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown Source) {code}
>  
> +Threads snippet:+
>  
> {code:java}
> "Thread-140" - Thread t@204
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:920)
>         at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:884)
>         at org.conscrypt.ConscryptEngineSocket$SSLInputStream.access$100(ConscryptEngineSocket.java:706)
>         at org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:230)
>         at org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:209)
>         - locked <27a37219> (a java.lang.Object)
>         at org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:547)
>         at org.conscrypt.ConscryptEngineSocket.getOutputStream(ConscryptEngineSocket.java:290)
>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
>         at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
>         - locked <1edf4d14> (a sun.net.www.protocol.https.HttpsClient)
>         at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
>         at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
>         at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
>         at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
>         at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
>         at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
>         at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
>         at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:148)
>         at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
>         at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
>         at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
>         at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
>         at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
>         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1970)
>         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1864)
>         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getWriteGeneration(GoogleCloudStorageImpl.java:1938)
>         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.create(GoogleCloudStorageImpl.java:411)
>         at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:490)
>         at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:471)
>         at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:144)
>         at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:62)
>         at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:243)
>         at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:230)
>         at org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:951)
>         at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:921)
>         at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown Source)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:232)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:188)
>         at org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:339)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
>         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:185)
>         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1058)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner$$Lambda$1284/1132939800.output(Unknown Source)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:445)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:130)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1061)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:771)
>         at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:97)
>         at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:43)
>         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:121)
>         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
>         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
>         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:137)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:212)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:163)
>         at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
>         at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1437)
>         at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1100(StreamingDataflowWorker.java:165)
>         at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$7.run(StreamingDataflowWorker.java:1113)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)   Locked ownable synchronizers:
>         - locked <4b11862a> (a java.util.concurrent.ThreadPoolExecutor$Worker {code}
>  
> +Netstat snippet:+
> {code:java}
> tcp        1      0 our-pipeline:34942    internal-ip:443        CLOSE_WAIT {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)