You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 00:17:44 UTC
[GitHub] [beam] damccorm opened a new issue, #21540: Jenkins worker sometimes crashes while running Python Flink pipeline
damccorm opened a new issue, #21540:
URL: https://github.com/apache/beam/issues/21540
Example failure from [https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/](https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/)
```
>>> RUNNING integration tests with pipeline options: --runner=FlinkRunner --project=apache-beam-testing
--environment_type=LOOPBACK – temp_location=gs://temp-storage-for-end-to-end-tests/temp-it --flink_job_server_jar=/home/jenkins/jenkins-slave/workspace/
beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar
4216
>>> pytest options: apache_beam/io/gcp/bigquery_read_it_test.py apache_beam/io/external/xlang_jdbcio_it_test.py
apache_beam/io/ external/xlang_kafkaio_it_test.py apache_beam/io/external/xlang_kinesisio_it_test.py
apache_beam/io/external/xlang_debeziumio_it_test. py --log-cli-level=INFO
...
15:27:18 INFO
apache_beam.utils.subprocess_server:subprocess_server.py:116 Starting service with ['java' '{-}jar'
'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar'
'{-}{-}flink-master' '[auto]' '{-}{-}artifacts-dir' '/tmp/beam-temp34uahjm8/artifactsfzc4uc4c' '{-}{-}job-port'
'56343' '{-}{-}artifact-port' '0' '{-}-expansion-port' '0']
15:27:18 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:20 PM software.amazon.awssdk.regions.internal.util.EC2MetadataUtils getItems'
15:27:20
INFO apache_beam.utils.subprocess_server:subprocess_server.py:125 b'WARNING: Unable to retrieve
the requested metadata.'
15:27:20 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:20 PM org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory createBuilder'
15:27:20
INFO apache_beam.utils.subprocess_server:subprocess_server.py:125 b"INFO: The AWS S3 Beam extension
was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then
ignore this message."
15:27:20 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver createArtifactStagingService'
15:27:21
INFO apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: ArtifactStagingService
started on localhost:36631'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver createExpansionService'
15:27:21
INFO apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: Java ExpansionService
started on localhost:35729'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver createJobServer'
15:27:21
INFO apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: JobService started on
localhost:56343'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May
03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver run'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'INFO: Job server now running, terminate with Ctrl+C'
15:27:21 FATAL: command execution failed
15:27:21
java.io.IOException: Backing channel 'apache-beam-jenkins-10' is disconnected.
15:27:21 at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
...
4318
FATAL: command execution failed
4319 java.io.IOException:
Backing channel 'apache-beam-jenkins-10' is disconnected.
4320 at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
4321 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
```
Perhaps a random crash or worker got overloaded. Other suites running at the same time:
beam_BiqQueryIO_Streaming_Performance_Test_Java #3729 beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java17 #134
beam_LoadTests_Python_GBK_Dataflow_Batch #1060
also crashed, but at the moment those tests have launched Dataflow jobs and were streaming log output. Only the beam_PostCommit_Python37 suite appeared to be running something intensive on the worker.
Filing to see how frequently this happens.
Imported from Jira [BEAM-14407](https://issues.apache.org/jira/browse/BEAM-14407). Original Jira may contain additional context.
Reported by: tvalentyn.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org