You are viewing a plain text version of this content. The canonical link for it is here.

Posted to builds@apache.org by Gavin McDonald <gm...@apache.org> on 2020/11/02 09:57:20 UTC

Re: Docker rate limits likely spell DOOM for any Apache project CI workflow relying on Docker Hub

Hi All,

Any project under the 'apache' org on DockerHub are not affected by the
restrictions.

Kind Regards

Gavin "The futures so bright you gotta wear shades" McDonald


On Thu, Oct 29, 2020 at 11:08 PM Gavin McDonald <gm...@apache.org>
wrote:

> Hi ,
>
> Just to note I have emailed DockerHub, asking for clarification on our
> account and what our benefits are.
>
>
> On Thu, Oct 29, 2020 at 6:34 PM Allen Wittenauer
> <aw...@effectivemachines.com.invalid> wrote:
>
>>
>> > On Oct 29, 2020, at 9:21 AM, Joan Touzet <wo...@apache.org> wrote:
>> >
>> > (Sidebar about the script's details)
>>
>>         Sure.
>>
>> > I tried to read the shell script, but I'm not in the headspace to fully
>> parse it at the moment. If I'm understanding correctly, this will still
>> catch CouchDB's CI docker images if they haven't changed in a week, which
>> happens often enough, negating the cache.
>>
>>         Correct. We actually tried something similar for a while and
>> discovered that in a lot of cases, upstream packages would disappear (or
>> worse, have security problems) thus making it look the image is still
>> "good" when it's not.  So a rebuild weekly at least guarantees some level
>> of "yup, still good" without having too much of a negative impact.
>>
>> > As a project, we're kind of stuck between a rock and a hard place. We
>> want to force a docker pull on the base CI image if it's out of date or the
>> image is corrupted. Otherwise we want to cache forever, not just for a
>> week. I can probably manage the "do we need to re-pull?" bit with some
>> clever CI scripting (check for the latest image hash locally, validate the
>> local image, pull if either fails) but I don't understand how the script
>> resolves the latter.
>>
>>         Most projects that use Yetus for their actual CI testing build
>> the image used for the CI as part of the CI.  It is a multi-stage,
>> multi-file docker build that has each run use a 'base' Dockerfile (provided
>> by the project) that rarely changed and a per-run file that Yetus generates
>> on the fly, with both images tagged by either git sha or branch (depending
>> upon context). Due to how docker image reference counts on the layers work,
>> this makes the docker images effectively used as a "rolling cache" and
>> (beyond a potential weekly cache removal) full builds are rare.. thus
>> making them relatively cheap (typically <1m runtime) unless the base image
>> had a change far up the chain (so structure wisely).  Of course, this also
>> tests the actual image of the CI build as part of the CI.  (What tests the
>> testers? philosophy)   Given that Jenkins tries really hard to have job
>> affinity, re-runs were still cheap after the initial one. [Ofc, now that
>> the cache is getting nuked every day....]
>>
>>         Actually, looking at some of the ci-hadoop jobs, it looks like
>> yetus is managing the cache on them.  I'm seeing individual run containers
>> from days ago at least.  So that's a good sign.
>>
>> > Can a exemption list be passed to the script so that images matching a
>> certain regex are excluded? You say the script ignores labels entirely, so
>> perhaps not...
>>
>>         Patches accepted. ;)
>>
>>         FWIW, I've been testing on my local machine for unrelated reasons
>> and I keep blowing away running containers I care about so I might end up
>> adding it myself.  That said: the code was specifically built for CI
>> systems where the expectation should be that nothing is permanent.
>>
>>
>
> --
>
> *Gavin McDonald*
> Systems Administrator
> ASF Infrastructure Team
>


-- 

*Gavin McDonald*
Systems Administrator
ASF Infrastructure Team

Re: Docker rate limits likely spell DOOM for any Apache project CI workflow relying on Docker Hub

Posted by Jarek Potiuk <Ja...@polidea.com>.

Cool!

On Mon, Nov 2, 2020 at 10:57 AM Gavin McDonald <gm...@apache.org> wrote:

> Hi All,
>
> Any project under the 'apache' org on DockerHub are not affected by the
> restrictions.
>
> Kind Regards
>
> Gavin "The futures so bright you gotta wear shades" McDonald
>
>
> On Thu, Oct 29, 2020 at 11:08 PM Gavin McDonald <gm...@apache.org>
> wrote:
>
> > Hi ,
> >
> > Just to note I have emailed DockerHub, asking for clarification on our
> > account and what our benefits are.
> >
> >
> > On Thu, Oct 29, 2020 at 6:34 PM Allen Wittenauer
> > <aw...@effectivemachines.com.invalid> wrote:
> >
> >>
> >> > On Oct 29, 2020, at 9:21 AM, Joan Touzet <wo...@apache.org> wrote:
> >> >
> >> > (Sidebar about the script's details)
> >>
> >>         Sure.
> >>
> >> > I tried to read the shell script, but I'm not in the headspace to
> fully
> >> parse it at the moment. If I'm understanding correctly, this will still
> >> catch CouchDB's CI docker images if they haven't changed in a week,
> which
> >> happens often enough, negating the cache.
> >>
> >>         Correct. We actually tried something similar for a while and
> >> discovered that in a lot of cases, upstream packages would disappear (or
> >> worse, have security problems) thus making it look the image is still
> >> "good" when it's not.  So a rebuild weekly at least guarantees some
> level
> >> of "yup, still good" without having too much of a negative impact.
> >>
> >> > As a project, we're kind of stuck between a rock and a hard place. We
> >> want to force a docker pull on the base CI image if it's out of date or
> the
> >> image is corrupted. Otherwise we want to cache forever, not just for a
> >> week. I can probably manage the "do we need to re-pull?" bit with some
> >> clever CI scripting (check for the latest image hash locally, validate
> the
> >> local image, pull if either fails) but I don't understand how the script
> >> resolves the latter.
> >>
> >>         Most projects that use Yetus for their actual CI testing build
> >> the image used for the CI as part of the CI.  It is a multi-stage,
> >> multi-file docker build that has each run use a 'base' Dockerfile
> (provided
> >> by the project) that rarely changed and a per-run file that Yetus
> generates
> >> on the fly, with both images tagged by either git sha or branch
> (depending
> >> upon context). Due to how docker image reference counts on the layers
> work,
> >> this makes the docker images effectively used as a "rolling cache" and
> >> (beyond a potential weekly cache removal) full builds are rare.. thus
> >> making them relatively cheap (typically <1m runtime) unless the base
> image
> >> had a change far up the chain (so structure wisely).  Of course, this
> also
> >> tests the actual image of the CI build as part of the CI.  (What tests
> the
> >> testers? philosophy)   Given that Jenkins tries really hard to have job
> >> affinity, re-runs were still cheap after the initial one. [Ofc, now that
> >> the cache is getting nuked every day....]
> >>
> >>         Actually, looking at some of the ci-hadoop jobs, it looks like
> >> yetus is managing the cache on them.  I'm seeing individual run
> containers
> >> from days ago at least.  So that's a good sign.
> >>
> >> > Can a exemption list be passed to the script so that images matching a
> >> certain regex are excluded? You say the script ignores labels entirely,
> so
> >> perhaps not...
> >>
> >>         Patches accepted. ;)
> >>
> >>         FWIW, I've been testing on my local machine for unrelated
> reasons
> >> and I keep blowing away running containers I care about so I might end
> up
> >> adding it myself.  That said: the code was specifically built for CI
> >> systems where the expectation should be that nothing is permanent.
> >>
> >>
> >
> > --
> >
> > *Gavin McDonald*
> > Systems Administrator
> > ASF Infrastructure Team
> >
>
>
> --
>
> *Gavin McDonald*
> Systems Administrator
> ASF Infrastructure Team
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Docker rate limits likely spell DOOM for any Apache project CI workflow relying on Docker Hub

Posted by Matt Sicker <bo...@gmail.com>.

There are global docker settings in Jenkins that apply to that similar to
the GitHub credentials. Or they could be provided as credentials in
general. Personally, I’d lean toward an over engineered solution like
Vault, but that’s out of scope I think.

On Mon, Nov 2, 2020 at 15:16 Joan Touzet <wo...@apache.org> wrote:

> Hey Gavin,
>
> To avoid the rate limiting, this means that we need to bake CI
> credentials into jobs for accounts inside of the apache org. Those
> credentials need to be used for all `docker pull` commands.
>
> How can we do this in a way that complies with ASF Infra policy?
>
> Thanks,
> Joan "the battle wages on / for Toy Soldiers" Touzet
>
>
> On 2020-11-02 4:57 a.m., Gavin McDonald wrote:
> > Hi All,
> >
> > Any project under the 'apache' org on DockerHub are not affected by the
> > restrictions.
> >
> > Kind Regards
> >
> > Gavin "The futures so bright you gotta wear shades" McDonald
> >
> >
> > On Thu, Oct 29, 2020 at 11:08 PM Gavin McDonald <gm...@apache.org>
> > wrote:
> >
> >> Hi ,
> >>
> >> Just to note I have emailed DockerHub, asking for clarification on our
> >> account and what our benefits are.
> >>
> >>
> >> On Thu, Oct 29, 2020 at 6:34 PM Allen Wittenauer
> >> <aw...@effectivemachines.com.invalid> wrote:
> >>
> >>>
> >>>> On Oct 29, 2020, at 9:21 AM, Joan Touzet <wo...@apache.org> wrote:
> >>>>
> >>>> (Sidebar about the script's details)
> >>>
> >>>          Sure.
> >>>
> >>>> I tried to read the shell script, but I'm not in the headspace to
> fully
> >>> parse it at the moment. If I'm understanding correctly, this will still
> >>> catch CouchDB's CI docker images if they haven't changed in a week,
> which
> >>> happens often enough, negating the cache.
> >>>
> >>>          Correct. We actually tried something similar for a while and
> >>> discovered that in a lot of cases, upstream packages would disappear
> (or
> >>> worse, have security problems) thus making it look the image is still
> >>> "good" when it's not.  So a rebuild weekly at least guarantees some
> level
> >>> of "yup, still good" without having too much of a negative impact.
> >>>
> >>>> As a project, we're kind of stuck between a rock and a hard place. We
> >>> want to force a docker pull on the base CI image if it's out of date
> or the
> >>> image is corrupted. Otherwise we want to cache forever, not just for a
> >>> week. I can probably manage the "do we need to re-pull?" bit with some
> >>> clever CI scripting (check for the latest image hash locally, validate
> the
> >>> local image, pull if either fails) but I don't understand how the
> script
> >>> resolves the latter.
> >>>
> >>>          Most projects that use Yetus for their actual CI testing build
> >>> the image used for the CI as part of the CI.  It is a multi-stage,
> >>> multi-file docker build that has each run use a 'base' Dockerfile
> (provided
> >>> by the project) that rarely changed and a per-run file that Yetus
> generates
> >>> on the fly, with both images tagged by either git sha or branch
> (depending
> >>> upon context). Due to how docker image reference counts on the layers
> work,
> >>> this makes the docker images effectively used as a "rolling cache" and
> >>> (beyond a potential weekly cache removal) full builds are rare.. thus
> >>> making them relatively cheap (typically <1m runtime) unless the base
> image
> >>> had a change far up the chain (so structure wisely).  Of course, this
> also
> >>> tests the actual image of the CI build as part of the CI.  (What tests
> the
> >>> testers? philosophy)   Given that Jenkins tries really hard to have job
> >>> affinity, re-runs were still cheap after the initial one. [Ofc, now
> that
> >>> the cache is getting nuked every day....]
> >>>
> >>>          Actually, looking at some of the ci-hadoop jobs, it looks like
> >>> yetus is managing the cache on them.  I'm seeing individual run
> containers
> >>> from days ago at least.  So that's a good sign.
> >>>
> >>>> Can a exemption list be passed to the script so that images matching a
> >>> certain regex are excluded? You say the script ignores labels
> entirely, so
> >>> perhaps not...
> >>>
> >>>          Patches accepted. ;)
> >>>
> >>>          FWIW, I've been testing on my local machine for unrelated
> reasons
> >>> and I keep blowing away running containers I care about so I might end
> up
> >>> adding it myself.  That said: the code was specifically built for CI
> >>> systems where the expectation should be that nothing is permanent.
> >>>
> >>>
> >>
> >> --
> >>
> >> *Gavin McDonald*
> >> Systems Administrator
> >> ASF Infrastructure Team
> >>
> >
> >
>

Re: Docker rate limits likely spell DOOM for any Apache project CI workflow relying on Docker Hub

Posted by Joan Touzet <wo...@apache.org>.

Hey Gavin,

To avoid the rate limiting, this means that we need to bake CI 
credentials into jobs for accounts inside of the apache org. Those 
credentials need to be used for all `docker pull` commands.

How can we do this in a way that complies with ASF Infra policy?

Thanks,
Joan "the battle wages on / for Toy Soldiers" Touzet


On 2020-11-02 4:57 a.m., Gavin McDonald wrote:
> Hi All,
> 
> Any project under the 'apache' org on DockerHub are not affected by the
> restrictions.
> 
> Kind Regards
> 
> Gavin "The futures so bright you gotta wear shades" McDonald
> 
> 
> On Thu, Oct 29, 2020 at 11:08 PM Gavin McDonald <gm...@apache.org>
> wrote:
> 
>> Hi ,
>>
>> Just to note I have emailed DockerHub, asking for clarification on our
>> account and what our benefits are.
>>
>>
>> On Thu, Oct 29, 2020 at 6:34 PM Allen Wittenauer
>> <aw...@effectivemachines.com.invalid> wrote:
>>
>>>
>>>> On Oct 29, 2020, at 9:21 AM, Joan Touzet <wo...@apache.org> wrote:
>>>>
>>>> (Sidebar about the script's details)
>>>
>>>          Sure.
>>>
>>>> I tried to read the shell script, but I'm not in the headspace to fully
>>> parse it at the moment. If I'm understanding correctly, this will still
>>> catch CouchDB's CI docker images if they haven't changed in a week, which
>>> happens often enough, negating the cache.
>>>
>>>          Correct. We actually tried something similar for a while and
>>> discovered that in a lot of cases, upstream packages would disappear (or
>>> worse, have security problems) thus making it look the image is still
>>> "good" when it's not.  So a rebuild weekly at least guarantees some level
>>> of "yup, still good" without having too much of a negative impact.
>>>
>>>> As a project, we're kind of stuck between a rock and a hard place. We
>>> want to force a docker pull on the base CI image if it's out of date or the
>>> image is corrupted. Otherwise we want to cache forever, not just for a
>>> week. I can probably manage the "do we need to re-pull?" bit with some
>>> clever CI scripting (check for the latest image hash locally, validate the
>>> local image, pull if either fails) but I don't understand how the script
>>> resolves the latter.
>>>
>>>          Most projects that use Yetus for their actual CI testing build
>>> the image used for the CI as part of the CI.  It is a multi-stage,
>>> multi-file docker build that has each run use a 'base' Dockerfile (provided
>>> by the project) that rarely changed and a per-run file that Yetus generates
>>> on the fly, with both images tagged by either git sha or branch (depending
>>> upon context). Due to how docker image reference counts on the layers work,
>>> this makes the docker images effectively used as a "rolling cache" and
>>> (beyond a potential weekly cache removal) full builds are rare.. thus
>>> making them relatively cheap (typically <1m runtime) unless the base image
>>> had a change far up the chain (so structure wisely).  Of course, this also
>>> tests the actual image of the CI build as part of the CI.  (What tests the
>>> testers? philosophy)   Given that Jenkins tries really hard to have job
>>> affinity, re-runs were still cheap after the initial one. [Ofc, now that
>>> the cache is getting nuked every day....]
>>>
>>>          Actually, looking at some of the ci-hadoop jobs, it looks like
>>> yetus is managing the cache on them.  I'm seeing individual run containers
>>> from days ago at least.  So that's a good sign.
>>>
>>>> Can a exemption list be passed to the script so that images matching a
>>> certain regex are excluded? You say the script ignores labels entirely, so
>>> perhaps not...
>>>
>>>          Patches accepted. ;)
>>>
>>>          FWIW, I've been testing on my local machine for unrelated reasons
>>> and I keep blowing away running containers I care about so I might end up
>>> adding it myself.  That said: the code was specifically built for CI
>>> systems where the expectation should be that nothing is permanent.
>>>
>>>
>>
>> --
>>
>> *Gavin McDonald*
>> Systems Administrator
>> ASF Infrastructure Team
>>
> 
>