You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 00:17:44 UTC

[GitHub] [beam] damccorm opened a new issue, #21540: Jenkins worker sometimes crashes while running Python Flink pipeline

damccorm opened a new issue, #21540:
URL: https://github.com/apache/beam/issues/21540

   Example failure from [https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/](https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/)
   ```
   
    >>> RUNNING integration tests with pipeline options: --runner=FlinkRunner --project=apache-beam-testing
   --environment_type=LOOPBACK –      temp_location=gs://temp-storage-for-end-to-end-tests/temp-it --flink_job_server_jar=/home/jenkins/jenkins-slave/workspace/
                    beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar
   4216
   >>>   pytest options: apache_beam/io/gcp/bigquery_read_it_test.py apache_beam/io/external/xlang_jdbcio_it_test.py
   apache_beam/io/           external/xlang_kafkaio_it_test.py apache_beam/io/external/xlang_kinesisio_it_test.py
   apache_beam/io/external/xlang_debeziumio_it_test.      py --log-cli-level=INFO
   
   ...
   
   15:27:18 INFO
       apache_beam.utils.subprocess_server:subprocess_server.py:116 Starting service with ['java' '{-}jar'
   '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar'
   '{-}{-}flink-master' '[auto]' '{-}{-}artifacts-dir' '/tmp/beam-temp34uahjm8/artifactsfzc4uc4c' '{-}{-}job-port'
   '56343' '{-}{-}artifact-port' '0' '{-}-expansion-port' '0']
   15:27:18 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125
   b'May 03, 2022 1:27:20 PM software.amazon.awssdk.regions.internal.util.EC2MetadataUtils getItems'
   15:27:20
   INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 b'WARNING: Unable to retrieve
   the requested metadata.'
   15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125
   b'May 03, 2022 1:27:20 PM org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory createBuilder'
   15:27:20
   INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 b"INFO: The AWS S3 Beam extension
   was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then
   ignore this message."
   15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125
   b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver createArtifactStagingService'
   15:27:21
   INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: ArtifactStagingService
   started on localhost:36631'
   15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125
   b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver createExpansionService'
   15:27:21
   INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: Java ExpansionService
   started on localhost:35729'
   15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125
   b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver createJobServer'
   15:27:21
   INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: JobService started on
   localhost:56343'
   15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May
   03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver run'
   15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125
   b'INFO: Job server now running, terminate with Ctrl+C'
   15:27:21 FATAL: command execution failed
   15:27:21
   java.io.IOException: Backing channel 'apache-beam-jenkins-10' is disconnected.
   15:27:21     at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
   
   ...
   
   4318
   FATAL: command execution failed                                                 
   4319 java.io.IOException:
   Backing channel 'apache-beam-jenkins-10' is disconnected.  
   4320   at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
                                             
   4321   at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
   
   
   
   ```
   
   
   Perhaps a random crash or worker got overloaded. Other suites running at the same time:
   
   beam_BiqQueryIO_Streaming_Performance_Test_Java #3729    beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java17 #134
   beam_LoadTests_Python_GBK_Dataflow_Batch #1060
   
   also crashed, but at the moment those tests have launched Dataflow jobs and were streaming log output. Only the beam_PostCommit_Python37 suite appeared to be running something intensive on the worker.
   
   Filing to see how frequently this happens.
   
   Imported from Jira [BEAM-14407](https://issues.apache.org/jira/browse/BEAM-14407). Original Jira may contain additional context.
   Reported by: tvalentyn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org