You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "chamikaramj (via GitHub)" <gi...@apache.org> on 2023/05/06 08:39:02 UTC

[GitHub] [beam] chamikaramj opened a new issue, #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to download the Python container

chamikaramj opened a new issue, #26576:
URL: https://github.com/apache/beam/issues/26576

   ### What happened?
   
   Seems like Java x-lang jobs are failing for Beam 2.47.0 RC3 due to not being able to download/setup the Python SDK harness container.
   
   Example:
   
   Container: gcr.io/cloud-dataflow/v1beta3/beam_python3.9_sdk:2.47.0
   
   Job: https://pantheon.corp.google.com/dataflow/jobs/us-central1/2023-05-06_01_31_45-3741497213019956539;graphView=0?project=apache-beam-testing&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))
   
   Errors: https://pantheon.corp.google.com/logs/query;query=resource.type%3D%22dataflow_step%22%0Aresource.labels.job_id%3D%222023-05-06_01_31_45-3741497213019956539%22%0AlogName%3D%22projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fkubelet%22%0Aseverity%3E%3DERROR;cursorTimestamp=2023-05-06T08:35:54.258518Z?project=apache-beam-testing
   
   Error syncing pod, skipping" err="failed to "StartContainer" for "sdk-1-0" with CrashLoopBackOff: "back-off 10s restarting failed container=sdk-1-0 pod=df-pythondataframewordcount--05060131-tupl-harness-pnng_default(9d9011b47f48e0b652f8d16cf81e8f8c)"" pod="default/df-pythondataframewordcount--05060131-tupl-harness-pnng" podUID=9d9011b47f48e0b652f8d16cf81e8f8c
   
   I'm getting the same error when running with the container overridden to use a clone in my private repo, so this is unlikely to be a GCR issue.
   
   The same job works fine with Beam 2.46.0 so seems like there's some issue with Beam 2.47.0 artifacts.
   
   https://pantheon.corp.google.com/dataflow/jobs/us-central1/2023-05-06_01_15_41-10489984441407428074;bottomTab=WORKER_LOGS;logsSeverity=INFO;graphView=0?project=apache-beam-testing&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))
   
   To reproduce, run the multi-lang quickstart job with a manual expansion service container and using Python 2.47.0 artifacts.
   
   https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/
   
   Creating this as a blocker for the ongoing RC since this breaks a feature that worked for the previous release.
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537220090

   There's a related bug here: https://github.com/apache/beam/issues/24470
   
   According to that bullseye base image had GLIB 2.31 while the SDK harness was linked against higher versions of GLIB. Same thing might be happening here.
   
   @lostluck @riteshghorse @jrmccluskey  any idea how to resolve this ? Should we re-build the SDK harness container with GLIB 2.31 installed ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537289504

   Thanks. Also confirmed that the x-lang test passes when https://github.com/apache/beam/pull/26054 is reverted: https://pantheon.corp.google.com/dataflow/jobs/us-central1/2023-05-06_21_46_17-7894509046776240415;graphView=0?project=apache-beam-testing&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))
   
   Forwarding to @lostluck to determine the next steps here.
   Probably we should revert https://github.com/apache/beam/pull/26054 from the release branch in the short term and re-build containers. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537498862

   We also found out that the images built on Jenkins are fine but images built on HEAD may fail depending on the setup of the local machine.
   
   For example,
   
   Following passes:
   
   docker run -it --entrypoint '/opt/apache/beam/boot' [gcr.io/apache-beam-testing/beam-sdk/beam_python3.9_sdk:0924840386f473e75324d645e0f0bd466e22dbad](http://gcr.io/apache-beam-testing/beam-sdk/beam_python3.9_sdk:0924840386f473e75324d645e0f0bd466e22dbad)
   
   But following fails on my linux machine:
   
   (on HEAD)
   
   ./gradlew :sdks:python:container:py39:docker
   docker run -it --entrypoint '/opt/apache/beam/boot' apache/beam_python3.9_sdk:[2.48.0.dev](http://2.48.0.dev/)
   
   This explains why HEAD tests didn't fail since the PR was submitted.
   
   We currently build Docker images for the release in the local machine of the release manager. We need to update the release process to build Docker images in a standard place that is also consistent with the tests so that we can catch issues like this early and consistently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1541037961

   Reducing the priority since we could unblock the 2.47.0 release by re-building containers in a different environment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537165776

   I see following logged many times in worker-startup logs which might be related.
   
   /opt/apache/beam/boot: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /opt/apache/beam/boot)
   
   https://pantheon.corp.google.com/logs/query;query=resource.type%3D%22dataflow_step%22%0Aresource.labels.job_id%3D%222023-05-06_01_31_45-3741497213019956539%22%0AlogName%3D%22projects%2Fapache-beam-testing%2Flogs%2Fdataflow.googleapis.com%252Fworker-startup%22;timeRange=2023-05-06T08:31:45.956Z%2F2023-05-06T10:30:07.855Z;cursorTimestamp=2023-05-06T10:24:21.376990841Z?project=apache-beam-testing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1538689304

   So updating to Go 1.20.2 wouldn't have caused that, since Go can only require at best the version of Glibc that's present on the machine doing the compilation. The problem is where/how the boot script was built, which will depend on whatever machine we're running the gradle commands on.
   
   Ultimately, the right solution here is that instead of compiling on the "local" machine, we're probably better off having the boot script built in a clean known environment, likely with CGO=0, and that will avoid bootscript glibc issues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537224873

   BTW this can be easily re-produced by running following.
   
   docker run -it --platform linux/amd64 --entrypoint '/opt/apache/beam/boot' [us-central1-artifactregistry.gcr.io/google.com/dataflow-containers/worker/v1beta3/beam_python3.9_sdk:2.47.0](http://us-central1-artifactregistry.gcr.io/google.com/dataflow-containers/worker/v1beta3/beam_python3.9_sdk:2.47.0)
   
   Which fails with the following error:
   
   /opt/apache/beam/boot: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /opt/apache/beam/boot)
   /opt/apache/beam/boot: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /opt/apache/beam/boot)
   
   Thanks @bvolpato 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537281420

   can confirm https://github.com/apache/beam/pull/26054 as the root cause here
   
   ```
   $ git checkout release-2.47.0
   $ ./gradlew :sdks:python:container:py39:docker
   $ docker run -it --entrypoint '/opt/apache/beam/boot' docker.io/apache/beam_python3.9_sdk:2.47.0.dev
   /opt/apache/beam/boot: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /opt/apache/beam/boot)
   /opt/apache/beam/boot: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /opt/apache/beam/boot)
   
   $ git revert 7ee74d2bf7338e82d35e4429e6d21decc1097621
   
   $ ./gradlew :sdks:python:container:py39:docker
   $ docker run -it --entrypoint '/opt/apache/beam/boot' docker.io/apache/beam_python3.9_sdk:2.47.0.dev
   2023/05/07 00:58:56 No id provided.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck closed issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck closed issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container
URL: https://github.com/apache/beam/issues/26576


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] brucearctor commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "brucearctor (via GitHub)" <gi...@apache.org>.
brucearctor commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1537171088

   great catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1538458646

   I am not sure I understand why tests running against the release branch didn't catch it. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1538453240

   > We also found out that the images built on Jenkins are fine but images built on HEAD may fail depending on the setup of the local machine.
   
   Can containers be fixed by changing the go compiler version on the local machine and rebuilding them or code changes to the branch are necessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #26576: [Bug]: Java x-lang jobs are failing for Beam 2.47.0 RC3 due to failing to start the Python container

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on issue #26576:
URL: https://github.com/apache/beam/issues/26576#issuecomment-1538616202

   I believe @jrmccluskey is trying to re-build containers in a different setup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org