You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ahmet Altay <al...@google.com> on 2017/12/20 23:10:49 UTC

Pushing daily/test containers for python

Hi all,

After some recent changes (e.g. [1]) we have a feasible container that we
can use to test Python SDK on portability framework. Until now we were
using Google provided container images for testing and for the released
product. We can gradually move away from that (at least partially) for
Python SDK.

I would like to propose building containers for testing purposes only and
pushing them to gcr.io as part of jenkins jobs. I would like to clarify two
points with the team first:

1. Use of GCR, I am proposing it for a few reasons:
- Beam's jenkins workers run on GCP, and it would be easy to push them to
gcr from there.
- If we use another service (perhaps with a free tier for open source
projects) we might be overusing it by pushing/pulling from our daily tests.
- This is similar to how we stage some artifacts to GCS as part of the
testing process.

2. Frequency of building and pushing containers

a. We can run it at every PR, by integrating with python post commit tests.
b. We can run it daily, by having a new Jenkins job.
c. We can run it manually, by having a parameterized Jenkins job that can
build and push a new container from a tag/commit. Given that we
infrequently change container code, I would suggest choosing this option.

What do you think about this? To be clear, this is just a proposal about
the testing environment. I am not suggesting anything about the release
artifacts.

Thank you,
Ahmet

[1] https://github.com/apache/beam/pull/4286

Re: Pushing daily/test containers for python

Posted by Ahmet Altay <al...@google.com>.
Thank you all for the comments. We can prototype something closer to (a)
and we can always change it later. My concern was that this would consume
more resources, but this might be a non-issue.

From a procedure perspective, do we need a formal vote on this?

On Thu, Dec 21, 2017 at 1:33 PM, Holden Karau <ho...@pigscanfly.ca> wrote:

> So I think we (or more accurately the PMC) need to be careful with how we
> post the container artifacts from an Apache POV since they most likely
> contain non-Apache licensed code (and also posting daileys can be
> conolicated since the PMC hasn’t voted on each one).
>

> For just testing it should probably be OK but we need to make sure users
> aren’t confused and think they are releases.
>

+1. Perhaps we can make these images private or have mechanisms in the
tests to remove images as part of the test cleanup.


>
>
> On Thu, Dec 21, 2017 at 10:03 AM Valentyn Tymofieiev <va...@google.com>
> wrote:
>
>> The GCR repository can be configured with public pull access, which I
>> think will be required to use the container.
>>
>> On Thu, Dec 21, 2017 at 2:34 AM, David Sabater Dinter <
>> david.sabater@gmail.com> wrote:
>>
>>> +1
>>> Hi,
>>> It makes sense to use GCR (locality with GCP services and works like any
>>> other container repository), only caveat being that the images will be
>>> private, in case anyone requires to debug locally will need access to pull
>>> the image or build locally and push.
>>> I agree getting closer to (a) is preferable assuming the build time
>>> doesn't increase dramatically in the post commit process.
>>>
>>> On Thu, Dec 21, 2017 at 1:59 AM Henning Rohde <he...@google.com>
>>> wrote:
>>>
>>>> +1
>>>>
>>>> It would be great to be able to test this aspect of portability. For
>>>> testing purposes, I think whatever container registry is convenient to use
>>>> for distribution is fine.
>>>>
>>>> Regarding frequency, I think we should consider something closer to
>>>> (a). The container images -- although usually quite stable -- are part of
>>>> the SDK at that commit and are not guaranteed to work with any other
>>>> version. Breaking changes in their interaction would cause confusion and
>>>> create noise. Any local tests can also in theory just build the container
>>>> images directly and not use any registry, so it might make sense to set up
>>>> the tests so that pushing occurs less frequently then building.
>>>>
>>>> Henning
>>>>
>>>>
>>>>
>>>> On Wed, Dec 20, 2017 at 3:10 PM, Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> After some recent changes (e.g. [1]) we have a feasible container that
>>>>> we can use to test Python SDK on portability framework. Until now we were
>>>>> using Google provided container images for testing and for the released
>>>>> product. We can gradually move away from that (at least partially) for
>>>>> Python SDK.
>>>>>
>>>>> I would like to propose building containers for testing purposes only
>>>>> and pushing them to gcr.io as part of jenkins jobs. I would like to
>>>>> clarify two points with the team first:
>>>>>
>>>>> 1. Use of GCR, I am proposing it for a few reasons:
>>>>> - Beam's jenkins workers run on GCP, and it would be easy to push them
>>>>> to gcr from there.
>>>>> - If we use another service (perhaps with a free tier for open source
>>>>> projects) we might be overusing it by pushing/pulling from our daily tests.
>>>>> - This is similar to how we stage some artifacts to GCS as part of the
>>>>> testing process.
>>>>>
>>>>> 2. Frequency of building and pushing containers
>>>>>
>>>>> a. We can run it at every PR, by integrating with python post commit
>>>>> tests.
>>>>> b. We can run it daily, by having a new Jenkins job.
>>>>> c. We can run it manually, by having a parameterized Jenkins job that
>>>>> can build and push a new container from a tag/commit. Given that we
>>>>> infrequently change container code, I would suggest choosing this option.
>>>>>
>>>>> What do you think about this? To be clear, this is just a proposal
>>>>> about the testing environment. I am not suggesting anything about the
>>>>> release artifacts.
>>>>>
>>>>> Thank you,
>>>>> Ahmet
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/4286
>>>>>
>>>>
>>>>
>> --
> Twitter: https://twitter.com/holdenkarau
>

Re: Pushing daily/test containers for python

Posted by Holden Karau <ho...@pigscanfly.ca>.
So I think we (or more accurately the PMC) need to be careful with how we
post the container artifacts from an Apache POV since they most likely
contain non-Apache licensed code (and also posting daileys can be
conolicated since the PMC hasn’t voted on each one).

For just testing it should probably be OK but we need to make sure users
aren’t confused and think they are releases.


On Thu, Dec 21, 2017 at 10:03 AM Valentyn Tymofieiev <va...@google.com>
wrote:

> The GCR repository can be configured with public pull access, which I
> think will be required to use the container.
>
> On Thu, Dec 21, 2017 at 2:34 AM, David Sabater Dinter <
> david.sabater@gmail.com> wrote:
>
>> +1
>> Hi,
>> It makes sense to use GCR (locality with GCP services and works like any
>> other container repository), only caveat being that the images will be
>> private, in case anyone requires to debug locally will need access to pull
>> the image or build locally and push.
>> I agree getting closer to (a) is preferable assuming the build time
>> doesn't increase dramatically in the post commit process.
>>
>> On Thu, Dec 21, 2017 at 1:59 AM Henning Rohde <he...@google.com> wrote:
>>
>>> +1
>>>
>>> It would be great to be able to test this aspect of portability. For
>>> testing purposes, I think whatever container registry is convenient to use
>>> for distribution is fine.
>>>
>>> Regarding frequency, I think we should consider something closer to (a).
>>> The container images -- although usually quite stable -- are part of the
>>> SDK at that commit and are not guaranteed to work with any other version.
>>> Breaking changes in their interaction would cause confusion and create
>>> noise. Any local tests can also in theory just build the container images
>>> directly and not use any registry, so it might make sense to set up the
>>> tests so that pushing occurs less frequently then building.
>>>
>>> Henning
>>>
>>>
>>>
>>> On Wed, Dec 20, 2017 at 3:10 PM, Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> After some recent changes (e.g. [1]) we have a feasible container that
>>>> we can use to test Python SDK on portability framework. Until now we were
>>>> using Google provided container images for testing and for the released
>>>> product. We can gradually move away from that (at least partially) for
>>>> Python SDK.
>>>>
>>>> I would like to propose building containers for testing purposes only
>>>> and pushing them to gcr.io as part of jenkins jobs. I would like to
>>>> clarify two points with the team first:
>>>>
>>>> 1. Use of GCR, I am proposing it for a few reasons:
>>>> - Beam's jenkins workers run on GCP, and it would be easy to push them
>>>> to gcr from there.
>>>> - If we use another service (perhaps with a free tier for open source
>>>> projects) we might be overusing it by pushing/pulling from our daily tests.
>>>> - This is similar to how we stage some artifacts to GCS as part of the
>>>> testing process.
>>>>
>>>> 2. Frequency of building and pushing containers
>>>>
>>>> a. We can run it at every PR, by integrating with python post commit
>>>> tests.
>>>> b. We can run it daily, by having a new Jenkins job.
>>>> c. We can run it manually, by having a parameterized Jenkins job that
>>>> can build and push a new container from a tag/commit. Given that we
>>>> infrequently change container code, I would suggest choosing this option.
>>>>
>>>> What do you think about this? To be clear, this is just a proposal
>>>> about the testing environment. I am not suggesting anything about the
>>>> release artifacts.
>>>>
>>>> Thank you,
>>>> Ahmet
>>>>
>>>> [1] https://github.com/apache/beam/pull/4286
>>>>
>>>
>>>
> --
Twitter: https://twitter.com/holdenkarau

Re: Pushing daily/test containers for python

Posted by Valentyn Tymofieiev <va...@google.com>.
The GCR repository can be configured with public pull access, which I think
will be required to use the container.

On Thu, Dec 21, 2017 at 2:34 AM, David Sabater Dinter <
david.sabater@gmail.com> wrote:

> +1
> Hi,
> It makes sense to use GCR (locality with GCP services and works like any
> other container repository), only caveat being that the images will be
> private, in case anyone requires to debug locally will need access to pull
> the image or build locally and push.
> I agree getting closer to (a) is preferable assuming the build time
> doesn't increase dramatically in the post commit process.
>
> On Thu, Dec 21, 2017 at 1:59 AM Henning Rohde <he...@google.com> wrote:
>
>> +1
>>
>> It would be great to be able to test this aspect of portability. For
>> testing purposes, I think whatever container registry is convenient to use
>> for distribution is fine.
>>
>> Regarding frequency, I think we should consider something closer to (a).
>> The container images -- although usually quite stable -- are part of the
>> SDK at that commit and are not guaranteed to work with any other version.
>> Breaking changes in their interaction would cause confusion and create
>> noise. Any local tests can also in theory just build the container images
>> directly and not use any registry, so it might make sense to set up the
>> tests so that pushing occurs less frequently then building.
>>
>> Henning
>>
>>
>>
>> On Wed, Dec 20, 2017 at 3:10 PM, Ahmet Altay <al...@google.com> wrote:
>>
>>> Hi all,
>>>
>>> After some recent changes (e.g. [1]) we have a feasible container that
>>> we can use to test Python SDK on portability framework. Until now we were
>>> using Google provided container images for testing and for the released
>>> product. We can gradually move away from that (at least partially) for
>>> Python SDK.
>>>
>>> I would like to propose building containers for testing purposes only
>>> and pushing them to gcr.io as part of jenkins jobs. I would like to
>>> clarify two points with the team first:
>>>
>>> 1. Use of GCR, I am proposing it for a few reasons:
>>> - Beam's jenkins workers run on GCP, and it would be easy to push them
>>> to gcr from there.
>>> - If we use another service (perhaps with a free tier for open source
>>> projects) we might be overusing it by pushing/pulling from our daily tests.
>>> - This is similar to how we stage some artifacts to GCS as part of the
>>> testing process.
>>>
>>> 2. Frequency of building and pushing containers
>>>
>>> a. We can run it at every PR, by integrating with python post commit
>>> tests.
>>> b. We can run it daily, by having a new Jenkins job.
>>> c. We can run it manually, by having a parameterized Jenkins job that
>>> can build and push a new container from a tag/commit. Given that we
>>> infrequently change container code, I would suggest choosing this option.
>>>
>>> What do you think about this? To be clear, this is just a proposal about
>>> the testing environment. I am not suggesting anything about the release
>>> artifacts.
>>>
>>> Thank you,
>>> Ahmet
>>>
>>> [1] https://github.com/apache/beam/pull/4286
>>>
>>
>>

Re: Pushing daily/test containers for python

Posted by David Sabater Dinter <da...@gmail.com>.
+1
Hi,
It makes sense to use GCR (locality with GCP services and works like any
other container repository), only caveat being that the images will be
private, in case anyone requires to debug locally will need access to pull
the image or build locally and push.
I agree getting closer to (a) is preferable assuming the build time doesn't
increase dramatically in the post commit process.

On Thu, Dec 21, 2017 at 1:59 AM Henning Rohde <he...@google.com> wrote:

> +1
>
> It would be great to be able to test this aspect of portability. For
> testing purposes, I think whatever container registry is convenient to use
> for distribution is fine.
>
> Regarding frequency, I think we should consider something closer to (a).
> The container images -- although usually quite stable -- are part of the
> SDK at that commit and are not guaranteed to work with any other version.
> Breaking changes in their interaction would cause confusion and create
> noise. Any local tests can also in theory just build the container images
> directly and not use any registry, so it might make sense to set up the
> tests so that pushing occurs less frequently then building.
>
> Henning
>
>
>
> On Wed, Dec 20, 2017 at 3:10 PM, Ahmet Altay <al...@google.com> wrote:
>
>> Hi all,
>>
>> After some recent changes (e.g. [1]) we have a feasible container that we
>> can use to test Python SDK on portability framework. Until now we were
>> using Google provided container images for testing and for the released
>> product. We can gradually move away from that (at least partially) for
>> Python SDK.
>>
>> I would like to propose building containers for testing purposes only and
>> pushing them to gcr.io as part of jenkins jobs. I would like to clarify
>> two points with the team first:
>>
>> 1. Use of GCR, I am proposing it for a few reasons:
>> - Beam's jenkins workers run on GCP, and it would be easy to push them to
>> gcr from there.
>> - If we use another service (perhaps with a free tier for open source
>> projects) we might be overusing it by pushing/pulling from our daily tests.
>> - This is similar to how we stage some artifacts to GCS as part of the
>> testing process.
>>
>> 2. Frequency of building and pushing containers
>>
>> a. We can run it at every PR, by integrating with python post commit
>> tests.
>> b. We can run it daily, by having a new Jenkins job.
>> c. We can run it manually, by having a parameterized Jenkins job that can
>> build and push a new container from a tag/commit. Given that we
>> infrequently change container code, I would suggest choosing this option.
>>
>> What do you think about this? To be clear, this is just a proposal about
>> the testing environment. I am not suggesting anything about the release
>> artifacts.
>>
>> Thank you,
>> Ahmet
>>
>> [1] https://github.com/apache/beam/pull/4286
>>
>
>

Re: Pushing daily/test containers for python

Posted by Henning Rohde <he...@google.com>.
+1

It would be great to be able to test this aspect of portability. For
testing purposes, I think whatever container registry is convenient to use
for distribution is fine.

Regarding frequency, I think we should consider something closer to (a).
The container images -- although usually quite stable -- are part of the
SDK at that commit and are not guaranteed to work with any other version.
Breaking changes in their interaction would cause confusion and create
noise. Any local tests can also in theory just build the container images
directly and not use any registry, so it might make sense to set up the
tests so that pushing occurs less frequently then building.

Henning



On Wed, Dec 20, 2017 at 3:10 PM, Ahmet Altay <al...@google.com> wrote:

> Hi all,
>
> After some recent changes (e.g. [1]) we have a feasible container that we
> can use to test Python SDK on portability framework. Until now we were
> using Google provided container images for testing and for the released
> product. We can gradually move away from that (at least partially) for
> Python SDK.
>
> I would like to propose building containers for testing purposes only and
> pushing them to gcr.io as part of jenkins jobs. I would like to clarify
> two points with the team first:
>
> 1. Use of GCR, I am proposing it for a few reasons:
> - Beam's jenkins workers run on GCP, and it would be easy to push them to
> gcr from there.
> - If we use another service (perhaps with a free tier for open source
> projects) we might be overusing it by pushing/pulling from our daily tests.
> - This is similar to how we stage some artifacts to GCS as part of the
> testing process.
>
> 2. Frequency of building and pushing containers
>
> a. We can run it at every PR, by integrating with python post commit tests.
> b. We can run it daily, by having a new Jenkins job.
> c. We can run it manually, by having a parameterized Jenkins job that can
> build and push a new container from a tag/commit. Given that we
> infrequently change container code, I would suggest choosing this option.
>
> What do you think about this? To be clear, this is just a proposal about
> the testing environment. I am not suggesting anything about the release
> artifacts.
>
> Thank you,
> Ahmet
>
> [1] https://github.com/apache/beam/pull/4286
>