You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chad Dombrova <ch...@gmail.com> on 2020/09/10 00:27:00 UTC

Modifying pip install behavior / custom pypi index

Hi all,
We are running into problems trying to use our own pypi mirror with Beam.
For those who are not well versed in the esotera of python package
management, pip provides a few ways to specify urls for the pypi index
server:

   - command line
   <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
   via --index-url
   - environment variables
   <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
   via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
   environment of the boot process that runs pip install.
   - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
   we could provide this as an artifact, but we don’t have any way of placing
   it in the correct location (e.g. /etc/pip.conf) on the instance that
   runs pip install.
   - requirements.txt files can specify certain pip install flags
   <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
   such as --index-url. As such, passing a requirements file via
   --requirements_file would theoretically work, but we also want to be
   able to provide dev packages as wheels via --extra_package, which would
   be installed independently from the requirements file and thus use the
   default pypi index. We may be able to upload our wheel as an artifact and
   refer to it using a local path in the requirements file, but this solution
   seems a bit brittle as the local artifacts path is different for each job.

Are there any known solutions to this problem? Here are some ideas:

   - add support for providing a pip.conf as a known artifact type (akin to
   --requirements_file).  this is by far the most powerful and
   straightforward solution, but do we have the stomach for yet another cli
   option?
   - add support for providing a destination path for artifacts, which
   would let us install it into /etc/pip.conf. I can see strong
   safety/security concerns around this.
   - provide a guarantee that the working directory for the boot process is
   inside the artifact directory: then we could refer to wheels inside our
   requirements file using relative paths.

We're happy to make a pull request to add support for this feature, but
it'd be great to have some input on the ideal solution before we begin.

thanks!
-chad

[1] https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
[2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
[3] https://pip.pypa.io/en/stable/user_guide/#config-file
[4]
https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format

-chad

Re: Modifying pip install behavior / custom pypi index

Posted by Ahmet Altay <al...@google.com>.
On Fri, Sep 11, 2020 at 3:02 PM Robert Bradshaw <ro...@google.com> wrote:

> The long term goal is for Dataflow to use the external containers rather
> than its own. Hopefully this happened sooner rather than later, and until
> then you can specify the beam container as a custom container.
>
> On Fri, Sep 11, 2020 at 2:58 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Ok great.  Next question:
>>
>> What is the relationship between sdks/python/container/boot.go and
>> Dataflow?  Is this file used within the Dataflow bootstrapping process?
>>
>> We're currently investigating a switch from Flink to Dataflow, and in
>> doing so we hope to be able to work our way back to using stock Dataflow
>> containers wherever possible.  If we make this PR to add pip.conf support,
>> those changes will be largely made in boot.go, and we'd just like to
>> confirm that our updates will also make it into Dataflow, verbatim.
>>
>
In addition to Robert's answer. Dataflow uses a similar boot.go file. In
case there will be a delay in switching Dataflow to use Beam containers, we
_might_ be able to apply changes from the PR to the Dataflow's boot.go file.


>
>> -chad
>>
>>
>>
>> On Fri, Sep 11, 2020 at 2:24 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> Hmm... this is a difficult question. I think adding support for a
>>>> pip.conf probably makes the most sense, despite it being yet another
>>>> option.
>>>>
>>>
>>> +1 - I think this is a good flag to add. I heard similar user requests
>>> for passing specific flags to pip before. Supporting a generic way with an
>>> optional flag would address those requests.
>>>
>>>
>>>>
>>>> Another alternative is to simply pre-install the dependencies you want
>>>> (or even just override /etc/pip.conf) in a custom container.
>>>>
>>>> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>> We are running into problems trying to use our own pypi mirror with
>>>>> Beam. For those who are not well versed in the esotera of python package
>>>>> management, pip provides a few ways to specify urls for the pypi index
>>>>> server:
>>>>>
>>>>>    - command line
>>>>>    <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
>>>>>    via --index-url
>>>>>    - environment variables
>>>>>    <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
>>>>>    via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
>>>>>    environment of the boot process that runs pip install.
>>>>>    - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
>>>>>    we could provide this as an artifact, but we don’t have any way of placing
>>>>>    it in the correct location (e.g. /etc/pip.conf) on the instance
>>>>>    that runs pip install.
>>>>>    - requirements.txt files can specify certain pip install flags
>>>>>    <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
>>>>>    such as --index-url. As such, passing a requirements file via
>>>>>    --requirements_file would theoretically work, but we also want to
>>>>>    be able to provide dev packages as wheels via --extra_package,
>>>>>    which would be installed independently from the requirements file and thus
>>>>>    use the default pypi index. We may be able to upload our wheel as an
>>>>>    artifact and refer to it using a local path in the requirements file, but
>>>>>    this solution seems a bit brittle as the local artifacts path is different
>>>>>    for each job.
>>>>>
>>>>> Are there any known solutions to this problem? Here are some ideas:
>>>>>
>>>>>    - add support for providing a pip.conf as a known artifact type
>>>>>    (akin to --requirements_file).  this is by far the most powerful
>>>>>    and straightforward solution, but do we have the stomach for yet another
>>>>>    cli option?
>>>>>    - add support for providing a destination path for artifacts,
>>>>>    which would let us install it into /etc/pip.conf. I can see strong
>>>>>    safety/security concerns around this.
>>>>>    - provide a guarantee that the working directory for the boot
>>>>>    process is inside the artifact directory: then we could refer to wheels
>>>>>    inside our requirements file using relative paths.
>>>>>
>>>>> We're happy to make a pull request to add support for this feature,
>>>>> but it'd be great to have some input on the ideal solution before we begin.
>>>>>
>>>>> thanks!
>>>>> -chad
>>>>>
>>>>> [1]
>>>>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
>>>>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
>>>>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file
>>>>> [4]
>>>>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
>>>>>
>>>>> -chad
>>>>>
>>>>>

Re: Modifying pip install behavior / custom pypi index

Posted by Robert Bradshaw <ro...@google.com>.
The long term goal is for Dataflow to use the external containers rather
than its own. Hopefully this happened sooner rather than later, and until
then you can specify the beam container as a custom container.

On Fri, Sep 11, 2020 at 2:58 PM Chad Dombrova <ch...@gmail.com> wrote:

> Ok great.  Next question:
>
> What is the relationship between sdks/python/container/boot.go and
> Dataflow?  Is this file used within the Dataflow bootstrapping process?
>
> We're currently investigating a switch from Flink to Dataflow, and in
> doing so we hope to be able to work our way back to using stock Dataflow
> containers wherever possible.  If we make this PR to add pip.conf support,
> those changes will be largely made in boot.go, and we'd just like to
> confirm that our updates will also make it into Dataflow, verbatim.
>
> -chad
>
>
>
> On Fri, Sep 11, 2020 at 2:24 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Hmm... this is a difficult question. I think adding support for a
>>> pip.conf probably makes the most sense, despite it being yet another
>>> option.
>>>
>>
>> +1 - I think this is a good flag to add. I heard similar user requests
>> for passing specific flags to pip before. Supporting a generic way with an
>> optional flag would address those requests.
>>
>>
>>>
>>> Another alternative is to simply pre-install the dependencies you want
>>> (or even just override /etc/pip.conf) in a custom container.
>>>
>>> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> We are running into problems trying to use our own pypi mirror with
>>>> Beam. For those who are not well versed in the esotera of python package
>>>> management, pip provides a few ways to specify urls for the pypi index
>>>> server:
>>>>
>>>>    - command line
>>>>    <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
>>>>    via --index-url
>>>>    - environment variables
>>>>    <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
>>>>    via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
>>>>    environment of the boot process that runs pip install.
>>>>    - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
>>>>    we could provide this as an artifact, but we don’t have any way of placing
>>>>    it in the correct location (e.g. /etc/pip.conf) on the instance
>>>>    that runs pip install.
>>>>    - requirements.txt files can specify certain pip install flags
>>>>    <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
>>>>    such as --index-url. As such, passing a requirements file via
>>>>    --requirements_file would theoretically work, but we also want to
>>>>    be able to provide dev packages as wheels via --extra_package,
>>>>    which would be installed independently from the requirements file and thus
>>>>    use the default pypi index. We may be able to upload our wheel as an
>>>>    artifact and refer to it using a local path in the requirements file, but
>>>>    this solution seems a bit brittle as the local artifacts path is different
>>>>    for each job.
>>>>
>>>> Are there any known solutions to this problem? Here are some ideas:
>>>>
>>>>    - add support for providing a pip.conf as a known artifact type
>>>>    (akin to --requirements_file).  this is by far the most powerful
>>>>    and straightforward solution, but do we have the stomach for yet another
>>>>    cli option?
>>>>    - add support for providing a destination path for artifacts, which
>>>>    would let us install it into /etc/pip.conf. I can see strong
>>>>    safety/security concerns around this.
>>>>    - provide a guarantee that the working directory for the boot
>>>>    process is inside the artifact directory: then we could refer to wheels
>>>>    inside our requirements file using relative paths.
>>>>
>>>> We're happy to make a pull request to add support for this feature, but
>>>> it'd be great to have some input on the ideal solution before we begin.
>>>>
>>>> thanks!
>>>> -chad
>>>>
>>>> [1]
>>>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
>>>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
>>>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file
>>>> [4]
>>>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
>>>>
>>>> -chad
>>>>
>>>>

Re: Modifying pip install behavior / custom pypi index

Posted by Chad Dombrova <ch...@gmail.com>.
Ok great.  Next question:

What is the relationship between sdks/python/container/boot.go and
Dataflow?  Is this file used within the Dataflow bootstrapping process?

We're currently investigating a switch from Flink to Dataflow, and in doing
so we hope to be able to work our way back to using stock Dataflow
containers wherever possible.  If we make this PR to add pip.conf support,
those changes will be largely made in boot.go, and we'd just like to
confirm that our updates will also make it into Dataflow, verbatim.

-chad



On Fri, Sep 11, 2020 at 2:24 PM Ahmet Altay <al...@google.com> wrote:

>
>
> On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Hmm... this is a difficult question. I think adding support for a
>> pip.conf probably makes the most sense, despite it being yet another
>> option.
>>
>
> +1 - I think this is a good flag to add. I heard similar user requests for
> passing specific flags to pip before. Supporting a generic way with an
> optional flag would address those requests.
>
>
>>
>> Another alternative is to simply pre-install the dependencies you want
>> (or even just override /etc/pip.conf) in a custom container.
>>
>> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <ch...@gmail.com> wrote:
>>
>>> Hi all,
>>> We are running into problems trying to use our own pypi mirror with
>>> Beam. For those who are not well versed in the esotera of python package
>>> management, pip provides a few ways to specify urls for the pypi index
>>> server:
>>>
>>>    - command line
>>>    <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
>>>    via --index-url
>>>    - environment variables
>>>    <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
>>>    via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
>>>    environment of the boot process that runs pip install.
>>>    - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
>>>    we could provide this as an artifact, but we don’t have any way of placing
>>>    it in the correct location (e.g. /etc/pip.conf) on the instance that
>>>    runs pip install.
>>>    - requirements.txt files can specify certain pip install flags
>>>    <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
>>>    such as --index-url. As such, passing a requirements file via
>>>    --requirements_file would theoretically work, but we also want to be
>>>    able to provide dev packages as wheels via --extra_package, which
>>>    would be installed independently from the requirements file and thus use
>>>    the default pypi index. We may be able to upload our wheel as an artifact
>>>    and refer to it using a local path in the requirements file, but this
>>>    solution seems a bit brittle as the local artifacts path is different for
>>>    each job.
>>>
>>> Are there any known solutions to this problem? Here are some ideas:
>>>
>>>    - add support for providing a pip.conf as a known artifact type
>>>    (akin to --requirements_file).  this is by far the most powerful and
>>>    straightforward solution, but do we have the stomach for yet another cli
>>>    option?
>>>    - add support for providing a destination path for artifacts, which
>>>    would let us install it into /etc/pip.conf. I can see strong
>>>    safety/security concerns around this.
>>>    - provide a guarantee that the working directory for the boot
>>>    process is inside the artifact directory: then we could refer to wheels
>>>    inside our requirements file using relative paths.
>>>
>>> We're happy to make a pull request to add support for this feature, but
>>> it'd be great to have some input on the ideal solution before we begin.
>>>
>>> thanks!
>>> -chad
>>>
>>> [1]
>>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
>>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
>>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file
>>> [4]
>>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
>>>
>>> -chad
>>>
>>>

Re: Modifying pip install behavior / custom pypi index

Posted by Ahmet Altay <al...@google.com>.
On Fri, Sep 11, 2020 at 2:11 PM Robert Bradshaw <ro...@google.com> wrote:

> Hmm... this is a difficult question. I think adding support for a pip.conf
> probably makes the most sense, despite it being yet another option.
>

+1 - I think this is a good flag to add. I heard similar user requests for
passing specific flags to pip before. Supporting a generic way with an
optional flag would address those requests.


>
> Another alternative is to simply pre-install the dependencies you want (or
> even just override /etc/pip.conf) in a custom container.
>
> On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Hi all,
>> We are running into problems trying to use our own pypi mirror with Beam.
>> For those who are not well versed in the esotera of python package
>> management, pip provides a few ways to specify urls for the pypi index
>> server:
>>
>>    - command line
>>    <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
>>    via --index-url
>>    - environment variables
>>    <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
>>    via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
>>    environment of the boot process that runs pip install.
>>    - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
>>    we could provide this as an artifact, but we don’t have any way of placing
>>    it in the correct location (e.g. /etc/pip.conf) on the instance that
>>    runs pip install.
>>    - requirements.txt files can specify certain pip install flags
>>    <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
>>    such as --index-url. As such, passing a requirements file via
>>    --requirements_file would theoretically work, but we also want to be
>>    able to provide dev packages as wheels via --extra_package, which
>>    would be installed independently from the requirements file and thus use
>>    the default pypi index. We may be able to upload our wheel as an artifact
>>    and refer to it using a local path in the requirements file, but this
>>    solution seems a bit brittle as the local artifacts path is different for
>>    each job.
>>
>> Are there any known solutions to this problem? Here are some ideas:
>>
>>    - add support for providing a pip.conf as a known artifact type (akin
>>    to --requirements_file).  this is by far the most powerful and
>>    straightforward solution, but do we have the stomach for yet another cli
>>    option?
>>    - add support for providing a destination path for artifacts, which
>>    would let us install it into /etc/pip.conf. I can see strong
>>    safety/security concerns around this.
>>    - provide a guarantee that the working directory for the boot process
>>    is inside the artifact directory: then we could refer to wheels inside our
>>    requirements file using relative paths.
>>
>> We're happy to make a pull request to add support for this feature, but
>> it'd be great to have some input on the ideal solution before we begin.
>>
>> thanks!
>> -chad
>>
>> [1]
>> https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
>> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
>> [3] https://pip.pypa.io/en/stable/user_guide/#config-file
>> [4]
>> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
>>
>> -chad
>>
>>

Re: Modifying pip install behavior / custom pypi index

Posted by Robert Bradshaw <ro...@google.com>.
Hmm... this is a difficult question. I think adding support for a pip.conf
probably makes the most sense, despite it being yet another option.

Another alternative is to simply pre-install the dependencies you want (or
even just override /etc/pip.conf) in a custom container.

On Wed, Sep 9, 2020 at 5:27 PM Chad Dombrova <ch...@gmail.com> wrote:

> Hi all,
> We are running into problems trying to use our own pypi mirror with Beam.
> For those who are not well versed in the esotera of python package
> management, pip provides a few ways to specify urls for the pypi index
> server:
>
>    - command line
>    <https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url>[1]:
>    via --index-url
>    - environment variables
>    <https://pip.pypa.io/en/stable/user_guide/#environment-variables>[2]:
>    via PIP_INDEX_URL. In Beam, we don’t have any way to influence the
>    environment of the boot process that runs pip install.
>    - pip.conf <https://pip.pypa.io/en/stable/user_guide/#config-file>[3]:
>    we could provide this as an artifact, but we don’t have any way of placing
>    it in the correct location (e.g. /etc/pip.conf) on the instance that
>    runs pip install.
>    - requirements.txt files can specify certain pip install flags
>    <https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format>[4],
>    such as --index-url. As such, passing a requirements file via
>    --requirements_file would theoretically work, but we also want to be
>    able to provide dev packages as wheels via --extra_package, which
>    would be installed independently from the requirements file and thus use
>    the default pypi index. We may be able to upload our wheel as an artifact
>    and refer to it using a local path in the requirements file, but this
>    solution seems a bit brittle as the local artifacts path is different for
>    each job.
>
> Are there any known solutions to this problem? Here are some ideas:
>
>    - add support for providing a pip.conf as a known artifact type (akin
>    to --requirements_file).  this is by far the most powerful and
>    straightforward solution, but do we have the stomach for yet another cli
>    option?
>    - add support for providing a destination path for artifacts, which
>    would let us install it into /etc/pip.conf. I can see strong
>    safety/security concerns around this.
>    - provide a guarantee that the working directory for the boot process
>    is inside the artifact directory: then we could refer to wheels inside our
>    requirements file using relative paths.
>
> We're happy to make a pull request to add support for this feature, but
> it'd be great to have some input on the ideal solution before we begin.
>
> thanks!
> -chad
>
> [1] https://pip.pypa.io/en/stable/reference/pip_install/#install-index-url
> [2] https://pip.pypa.io/en/stable/user_guide/#environment-variables
> [3] https://pip.pypa.io/en/stable/user_guide/#config-file
> [4]
> https://pip.pypa.io/en/stable/reference/pip_install/#requirements-file-format
>
> -chad
>
>