You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 23:38:48 UTC

[GitHub] [beam] damccorm opened a new issue, #21492: Beam Core IO: WriteFiles stuck

damccorm opened a new issue, #21492:
URL: https://github.com/apache/beam/issues/21492

   In our pipeline, workers are sometimes stuck at step __WriteShardsIntoTempFiles__. We believe this happens when GCS fails to respond for more than 10 minutes and Google's VPC cuts the connection. The Java code will just hang forever because it's waiting on FIN or RST from the peer. 
   
   Environment: Java 8, Beam 2.34/2.35, Dataflow runner.
   
   See below for relevant information from logs, threads, netstat. 
   
   ****Log snippet:****
   
   1)
   ```
   
   Operation ongoing in step Write Events To File/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles
   for at least 09h15m00s without outputting or completing in state process
     at org.conscrypt.NativeCrypto.SSL_do_handshake(Native
   Method)
     at org.conscrypt.NativeSsl.doHandshake(NativeSsl.java:392)
     at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:225)
    
   at org.conscrypt.ConscryptFileDescriptorSocket.waitForHandshake(ConscryptFileDescriptorSocket.java:474)
    
   at org.conscrypt.ConscryptFileDescriptorSocket.getOutputStream(ConscryptFileDescriptorSocket.java:461)
    
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
     at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    
   at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
     at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
    
   at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
    
   at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
     at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
    
   at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
    
   at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
     at
   com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:148)
     at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
    
   at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
     at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
    
   at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
    
   at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
    
   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1955)
    
   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1857)
    
   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getWriteGeneration(GoogleCloudStorageImpl.java:1926)
    
   at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.create(GoogleCloudStorageImpl.java:374)
     at
   org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:476)
     at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:446)
    
   at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:127)
     at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:60)
    
   at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:239)
     at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:226)
    
   at org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:935)
     at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:768)
    
   at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown
   Source)
   ```
   
   2)
   
    
   ```
   
   Operation ongoing in step Write Events To File/WriteFiles/WriteShardedBundlesToTempFiles/WriteShardsIntoTempFiles
   for at least 04h55m00s without outputting or completing in state process
     at sun.misc.Unsafe.park(Native
   Method)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
     at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
    
   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
     at com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(BaseAbstractGoogleAsyncWriteChannel.java:247)
    
   at com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.close(BaseAbstractGoogleAsyncWriteChannel.java:168)
    
   at org.apache.beam.sdk.io.FileBasedSink$Writer.close(FileBasedSink.java:1029)
     at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:932)
    
   at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown
   Source) 
   ```
   
    
   
   ****Threads snippet:****
   
    
   ```
   
   "Thread-140" - Thread t@204
      java.lang.Thread.State: RUNNABLE
           at java.net.SocketInputStream.socketRead0(Native
   Method)
           at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
           at java.net.SocketInputStream.read(SocketInputStream.java:171)
    
         at java.net.SocketInputStream.read(SocketInputStream.java:141)
           at org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket(ConscryptEngineSocket.java:920)
    
         at org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket(ConscryptEngineSocket.java:884)
    
         at org.conscrypt.ConscryptEngineSocket$SSLInputStream.access$100(ConscryptEngineSocket.java:706)
    
         at org.conscrypt.ConscryptEngineSocket.doHandshake(ConscryptEngineSocket.java:230)
           at
   org.conscrypt.ConscryptEngineSocket.startHandshake(ConscryptEngineSocket.java:209)
           - locked
   <27a37219> (a java.lang.Object)
           at org.conscrypt.ConscryptEngineSocket.waitForHandshake(ConscryptEngineSocket.java:547)
    
         at org.conscrypt.ConscryptEngineSocket.getOutputStream(ConscryptEngineSocket.java:290)
        
     at sun.net.www.http.HttpClient.openServer(HttpClient.java:465)
           at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    
         - locked <1edf4d14> (a sun.net.www.protocol.https.HttpsClient)
           at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
    
         at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
           at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
    
         at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
      
       at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
        
     at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
    
         at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:162)
    
         at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:148)
          
   at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:84)
           at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1012)
    
         at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:514)
    
         at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:455)
    
         at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
    
         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1970)
    
         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1864)
    
         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getWriteGeneration(GoogleCloudStorageImpl.java:1938)
    
         at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.create(GoogleCloudStorageImpl.java:411)
    
         at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:490)
           at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.create(GcsUtil.java:471)
    
         at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:144)
    
         at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.create(GcsFileSystem.java:62)
      
       at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:243)
           at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:230)
    
         at org.apache.beam.sdk.io.FileBasedSink$Writer.open(FileBasedSink.java:951)
           at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn.processElement(WriteFiles.java:921)
    
         at org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn$DoFnInvoker.invokeProcessElement(Unknown
   Source)
           at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:232)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:188)
    
         at org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:339)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
    
         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:185)
    
         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:108)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.lambda$onTrigger$1(ReduceFnRunner.java:1058)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner$$Lambda$1284/1132939800.output(Unknown
   Source)
           at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnContextFactory$OnTriggerContextImpl.output(ReduceFnContextFactory.java:445)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SystemReduceFn.onTrigger(SystemReduceFn.java:130)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.onTrigger(ReduceFnRunner.java:1061)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.ReduceFnRunner.onTimers(ReduceFnRunner.java:771)
    
         at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:97)
    
         at org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:43)
    
         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:121)
    
         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
    
         at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
    
         at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:137)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:212)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:163)
    
         at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
    
         at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1437)
    
         at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1100(StreamingDataflowWorker.java:165)
    
         at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$7.run(StreamingDataflowWorker.java:1113)
    
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748) 
    Locked ownable synchronizers:
           - locked <4b11862a> (a java.util.concurrent.ThreadPoolExecutor$Worker
   
   ```
   
    
   
   ****Netstat snippet:****
   ```
   
   tcp        1      0 our-pipeline:34942    internal-ip:443        CLOSE_WAIT 
   ```
   
    
   
   Imported from Jira [BEAM-14067](https://issues.apache.org/jira/browse/BEAM-14067). Original Jira may contain additional context.
   Reported by: yianni.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] cnauroth commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
cnauroth commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1254307315

   Hello @scwhittle . We have released GCS connector 2.2.8 containing the fix for Conscrypt socket hangs.
   
   https://mvnrepository.com/artifact/com.google.cloud.bigdataoss/gcs-connector/hadoop3-2.2.8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] scwhittle commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
scwhittle commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1252149485

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] scwhittle commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
scwhittle commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1254799340

   Thanks Chris! https://github.com/apache/beam/pull/23300 to update Beam


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] scwhittle commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
scwhittle commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1237803242

   I believe this is https://github.com/google/conscrypt/issues/864 and will be addressed once https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/865 is in a 2.2.x release and beam depedencies are so google_cloud_bigdataoss_version includes this version.
   
   In the meantime this can be avoided by disabling conscrypt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm closed issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
damccorm closed issue #21492: Beam Core IO: WriteFiles stuck
URL: https://github.com/apache/beam/issues/21492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] scwhittle commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
scwhittle commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1239102866

   The fix should be in the 2.2.8 hadoop-connectors release, no ETA yet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1651827081

   The reason for rollbacks here (`use_grpc_for_gcs` experiment) broke for other reasons, and I'm attempting an upgrade now to 2.2.16: https://github.com/apache/beam/pull/27679
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] scwhittle commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
scwhittle commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1285086957

   Reopening as #23300 is being rolled back due to #23588
   
   Workarounds in the meantime are to:
   - not enable the conscrypt security provider via enable_conscrypt_security_provider experiment
   - update the bigdataoss:gcsio library used manually to 2.2.8 (and not use use_grpc_for_gcs experiment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] scwhittle commented on issue #21492: Beam Core IO: WriteFiles stuck

Posted by GitBox <gi...@apache.org>.
scwhittle commented on issue #21492:
URL: https://github.com/apache/beam/issues/21492#issuecomment-1285082192

   .reopen-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org