You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/19 16:21:32 UTC

[GitHub] [beam] RobMcKiernan opened a new issue, #25085: [Bug]: Dependencies from private repositories unable to be seen

RobMcKiernan opened a new issue, #25085:
URL: https://github.com/apache/beam/issues/25085

   ### What happened?
   
   
   Running a gcp dataflow, using the python sdk 2.44.0 I can no longer access my private repositories. It works on 2.43.0
   
   My set up is as follows:
   ```docker
   FROM my.private.repourl/python:3.8-slim-builder as builder
   FROM my.private.repourl/python:3.8-slim
   COPY --from=apache/beam_python3.8_sdk:2.43.0 /opt/apache/beam /opt/apache/beam
   # this virtual env has all the dependencies I need pre-installed on it
   COPY --from=builder $VENV_PATH $VENV_PATH
   ENTRYPOINT ["/opt/apache/beam/boot"]
   ```
   
   This is my run command:
   ```sh
   poetry run python -m projname.main \
     --project="$PROJECT_ID" \
     --runner=DataFlowRunner \
     --temp_location=gs://"$BUCKET_NAME"/temp \
     --region="$REGION" \
     --job_name="$JOB_NAME" \
     --setup_file=./setup.py \
     --subnetwork https://www.googleapis.com/compute/v1/projects/"$PROJECT_ID"/regions/"$REGION"/subnetworks/"$SUBNET" \
     --experiment=use_runner_v2 \
     --sdk_container_image=$IMAGE_NAME \
     --template_location=gs://"$BUCKET_NAME"/templates/"$JOB_NAME" \
   
   ```
   
   
   Checking my dataflow worker logs it fails to see my private repos:
   ```
   ERROR: Could not find a version that satisfies the requirement package-i-want<3.0.0,>=2.2.0 (from name-of-my-dataflow) (from versions: none)
   ```
   
   I think this is the culprit PR: https://github.com/apache/beam/pull/23684/files#diff-cc1f3d7f808c692a6102847bec78809f2e4350c5ee34278100ce0f55d8c23d68R234
   
   
   
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RobMcKiernan commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "RobMcKiernan (via GitHub)" <gi...@apache.org>.
RobMcKiernan commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527212805

   Yep, that worked! My new Dockerfile, in case it helps anyone:
   ```
   # This image is just a thin wrapper around the standard python10 slim image. It should work just fine using the standard image
   FROM eu.gcr.io/my-proj/python:3.10-slim
   SHELL ["/bin/bash", "-o", "pipefail", "-c"]
   COPY --from=apache/beam_python3.10_sdk:2.46.0 /opt/apache/beam /opt/apache/beam
   
   ENV PYTHONUNBUFFERED=1 \
       PYTHONDONTWRITEBYTECODE=1 \
       PIP_NO_CACHE_DIR=off \
       PIP_DISABLE_PIP_VERSION_CHECK=on \
       PIP_DEFAULT_TIMEOUT=100 \
       POETRY_NO_INTERACTION=1 \
       PATH=/usr/lib/google-cloud-sdk/bin:$PATH
   
   WORKDIR /app
   
   # -- Omitted Section to sort out my gcloud authentication, which I'm not including out of paranoia --
   
   RUN pip install --no-cache-dir \
           poetry \
           keyring \
           keyrings.google-artifactregistry-auth
   
   COPY ./pyproject.toml ./poetry.lock ./
   
   # setting virtualenvs.create to false prevents poetry using venvs as
   # beam >2.43 uses global python packages only
   RUN poetry config virtualenvs.create false \
    && poetry install --no-cache --no-root --only main \
    && rm -rf /root/.cache
   
   ENTRYPOINT ["/opt/apache/beam/boot"]
   ```
   
   tl;dr for anyone skipping to the end: Make sure your python packages are installed in `/usr/local/lib/python<version number>/site-packages` in your docker container.
   
   Cheers for your help everyone! Should I close, or would you like it kept open? I guess at a minimum this should be documented somewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn closed issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn closed issue #25085: [Bug]: Dependencies from private repositories unable to be seen
URL: https://github.com/apache/beam/issues/25085


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1426531049

   I see. It looks like you may be copying site-packages directory from a different virtual environment. There was a change recently that creates one virtual environment per each SDK process: https://github.com/apache/beam/pull/16658
   
   It could be that you were impacted by this change, if you have been using a non-default virtual environment to store your packages.
   
   Note that dependencies installed in the global python environment should still be accessible in individual python environments, which are created after https://github.com/apache/beam/pull/16658.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1398865657

   ack, thank, I'll try to get some eyes here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1400747109

   FWIW, if there is a regression between versions, it should be possible to bisect  the regression to an exact commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527717699

   looks like i missed the second diff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] riteshghorse commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "riteshghorse (via GitHub)" <gi...@apache.org>.
riteshghorse commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1400644538

   are your private dependencies listed in `requirements.txt` somehow and not pulled locally when running the job? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RobMcKiernan commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "RobMcKiernan (via GitHub)" <gi...@apache.org>.
RobMcKiernan commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1433105556

   Ah ok, yep that sounds like it could be the culprit then.
   
   I've noticed that the dataflow docs use `pip` to install python packages whereas I'm using poetry. I wonder if that plays into this? https://cloud.google.com/dataflow/docs/guides/using-custom-containers#prebuild 
   
   The envvar `VENV_PATH` is set to `/venv` in my  `COPY --from=builder $VENV_PATH $VENV_PATH` if that helps illuminate anything


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RobMcKiernan commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "RobMcKiernan (via GitHub)" <gi...@apache.org>.
RobMcKiernan commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1408729605

   Sorry I've been away for the past week.
   
   > are your private dependencies listed in `requirements.txt` somehow and not pulled locally when running the job?
   I don't use a requirements.txt. Instead I use [poetry](https://python-poetry.org/), which creates a `poetry.lock` file, which serves a similar purpose as a `requirements.txt`. I have verified that my local poetry virtual env has my private python repos installed in it.
   
   The other part to this is that I've created a base docker container for my workers on gcp to use. The private docker image referred to in my Dockerfile `FROM my.private.image-repo/python:3.8-slim-builder as builder` has access to my private python repositories (I've verified this by pulling the docker image myself and exec-ing into it). It seems it is at this point that it fails to have access to my private repos.
   
   @tvalentyn no, I'm afraid my python version has remained constant.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527410075

   You could modify CHANGES.md to further document suggestions/instructions pertaining to change in behavior in 2.44.0 if you'd like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RobMcKiernan commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "RobMcKiernan (via GitHub)" <gi...@apache.org>.
RobMcKiernan commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527495787

   I just tried raising a PR, but it appears that I don't have the needed permissions to push to this repo. This is the diff of my PR:
   ```diff
   diff --git a/CHANGES.md b/CHANGES.md
   index 871f24bf9d..c7578a8a61 100644
   --- a/CHANGES.md
   +++ b/CHANGES.md
   @@ -254,6 +254,8 @@
      runner (such as Dataflow Runner v2) will need to provide this package and its dependencies.
    * Slices now use the Beam Iterable Coder. This enables cross language use, but breaks pipeline updates
      if a Slice type is used as a PCollection element or State API element. (Go)[#24339](https://github.com/apache/beam/issues/24339)
   +* Custom worker Dockerfiles must now install their dependencies in the global python environment. For example, when using poetry
   +  you must use `poetry config virtualenvs.create false` before installing deps [#25085](https://github.com/apache/beam/issues/25085)
    
    ## Deprecations
    
   diff --git a/website/www/site/content/en/documentation/runtime/environments.md b/website/www/site/content/en/documentation/runtime/environments.md
   index 17ee452a57..46a7f69209 100644
   --- a/website/www/site/content/en/documentation/runtime/environments.md
   +++ b/website/www/site/content/en/documentation/runtime/environments.md
   @@ -198,6 +198,7 @@ Beam offers a way to provide your own custom container image. The easiest way to
    >The version specified in the `RUN` instruction must match the version used to launch the pipeline.<br>
    >**Make sure that the Python or Java runtime version specified in the base image is the same as the version used to run the pipeline.**
    
   +>**NOTE**: When using version >=2.44.0 you must ensure dependencies are installed in the global python environment in the resulting image 
    
    2. [Build](https://docs.docker.com/engine/reference/commandline/build/) and [push](https://docs.docker.com/engine/reference/commandline/push/) the image using Docker.
      ```
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527716927

   Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] RobMcKiernan commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "RobMcKiernan (via GitHub)" <gi...@apache.org>.
RobMcKiernan commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1525791480

   I'm back working on this now. I tried altering my `PYTHONPATH` in my Dockerfile, but that didn't seem to work, although I'm not quite sure why.
   
   I'm now experimenting using `poetry config virtualenvs.create false` to install my packages in the global python environment. I'll let you know how it goes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527716787

   np. you might have to fork a repo first to create PRs. Sent you https://github.com/apache/beam/pull/26471


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1527411570

   Glad to hear you resolved the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1398850520

   CC: @robertwb 
   CC: @tvalentyn 
   
   Sounds like a regression. Is there a workaround to mitigate this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1400748999

   > ERROR: Could not find a version that satisfies the requirement package-i-want<3.0.0,>=2.2.0 (from name-of-my-dataflow) (from versions: none)
   
   Re:  'from versions: none'  - just to double check, when you changed versions of Beam, did you by chance also change the version of Python interpreter in addition to Beam version? Could you double check that it didn't change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1426534655

   I think 2.44.0 is the first release that include https://github.com/apache/beam/pull/16658 , which matches the timing you describe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1433943540

   The global environment will have packages installed in  ./usr/local/lib/python3.8/site-packages. If you activate a custom venv, I think it will be ignored now that the codepath has changed in https://github.com/apache/beam/pull/16658, and a python process creates an individual environment.
   
   I suppose you could try to manipulate the PYTHONPATH variable to include your environment, but that may be brittle if you have package mismatches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] riteshghorse commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "riteshghorse (via GitHub)" <gi...@apache.org>.
riteshghorse commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1398944434

   I looked at the mentioned culprit PR and I don't think its quite the culprit because it is not discarding anything that used to work earlier. I'll take a closer look at the bug for other possibilities.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25085: [Bug]: Dependencies from private repositories unable to be seen

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25085:
URL: https://github.com/apache/beam/issues/25085#issuecomment-1525828919

   sg, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org