You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2019/06/26 16:32:25 UTC

[DISCUSS] Ongoing Travis CI service degradation

It seems that there is intermittent Apache-wide degradation of Travis
CI services -- I was looking at https://travis-ci.org/apache today and
there appeared to be a stretch of 3-4 hours where no queued builds on
github.com/apache were running at all. I initially thought that the
issue was contention with other Apache projects but even with
round-robin allocation and a concurrency limit (e.g. no Apache project
having more than 5-6 concurrent builds) that wouldn't explain why NO
builds are running.

This is obviously disturbing given how reliant we are on Travis CI to
validate patches to be merged.

I've opened a support ticket with Travis CI to see if they can provide
some insight into what's going on. There is also an INFRA ticket where
other projects have reported some similar experiences

https://issues.apache.org/jira/browse/INFRA-18533

As a meta-comment, at some point Apache Arrow is going to need to move
off of public CI services for patch validation so that we can have
unilateral control over scaling our build / test resources as the
community grows larger. As the most active merger of patches (I have
merged over 50% of pull requests over the project's history) this
affects me greatly as I am often monitoring builds on many open PRs so
that I can merge them as soon as possible. We are often resorting to
builds on contributor's forks (assuming they have enabled Travis CI /
Appveyor)

As some context around Travis CI in particular, in January Travis CI
was acquired by Idera, a private equity (I think?) developer tools
conglomerate. It's likely that we're seeing some "maximize profit,
minimize costs" behavior in play, so the recent experience could
become the new normal.

- Wes

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

> GitLab-CI integrates structured artifacts (like which
> tests pass and their output [1] and changes to continuous metrics like
> performance [2])

I didn't know it.

Our CI failures aren't just only test failures. We have lint
failures, build failures (they may be occurred by external
dependencies) and so on. So our needs for the structured
artifacts feature may be limited.


Thanks,
--
kou

In <87...@jedbrown.org>
  "Re: [DISCUSS] Ongoing Travis CI service degradation" on Sat, 29 Jun 2019 21:02:41 -0600,
  Jed Brown <je...@jedbrown.org> wrote:

> Sutou Kouhei <ko...@clear-code.com> writes:
> 
>> How about creating a mirror repository on
>> https://gitlab.com/ only to run CI jobs?
>>
>> This is an idea that is described in
>> https://issues.apache.org/jira/browse/ARROW-5673 .
>>
>> GitLab CI can attach external workers. So we can increase CI
>> capacity by adding our new workers. GitLab also provides
>> Docker registry. It means that we can cache built Docker
>> images for our CI. It will reduce CI time.
> 
> I have some experience with GitLab-CI.  The gitlab-runner is great and
> easy to deploy.  GitLab-CI integrates structured artifacts (like which
> tests pass and their output [1] and changes to continuous metrics like
> performance [2]) very nicely into GitLab Merge Requests, but when you
> connect to an external repository (GitHub, etc.), it only reports
> pass/fail and you can't access the structured artifacts [3], only the
> console logs and compressed archives of artifacts if you use that
> feature.
> 
> If you're happy with clicking through to console logs in case a pipeline
> fails (the Travis model), then GitLab-CI is easy to use and will serve
> your purposes.  If you really want the structured features, then I'd
> encourage you to mention that in [3].
> 
> [1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
> [2] https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
> [3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

> Currently all of the workers are running within docker daemons,
> so all of the images are cached once they are pulled to a docker
> daemon. The `worker_preparation` takes only 3 seconds:
> https://ci.ursalabs.org/#/builders/66/builds/2157

Great!

> but the package
> build scripts needs to be ported to either crossbow or ursabot or the
> new gitlab CI.

I can work on this when we decide our approach.

> Here is a PR which passes `--runtime=nvidia` option to the docker run
> command, thus making CUDA enabled tests possible on buildbot's
> docker workers: https://github.com/ursa-labs/ursabot/pull/118

I want to use it for https://github.com/apache/arrow/pull/4735

> We can also maintain our buildbot configuration in apache/arrow
> similarly, but with a more flexible python based DSL.

I like this approach. Because source code and how to test it
should be the same place. If source code is changed, how to
test may be changed too.


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: [DISCUSS] Ongoing Travis CI service degradation" on Sun, 30 Jun 2019 21:46:42 +0200,
  Krisztián Szűcs <sz...@gmail.com> wrote:

> On Sun, Jun 30, 2019 at 12:03 AM Sutou Kouhei <ko...@clear-code.com> wrote:
> 
>> Hi,
>>
>> How about creating a mirror repository on
>> https://gitlab.com/ only to run CI jobs?
>>
>> This is an idea that is described in
>> https://issues.apache.org/jira/browse/ARROW-5673 .
>>
> I do agree that We should investigate the features provided by
> GitLab. Buildbot might not be so familiar to others, so I'm trying
> to provide some details to see how it compares to GitLab CI.
> 
>>
>> GitLab CI can attach external workers. So we can increase CI
>> capacity by adding our new workers. GitLab also provides
>> Docker registry. It means that we can cache built Docker
>> images for our CI. It will reduce CI time.
>>
> Currently all of the workers are running within docker daemons,
> so all of the images are cached once they are pulled to a docker
> daemon. The `worker_preparation` takes only 3 seconds:
> https://ci.ursalabs.org/#/builders/66/builds/2157
> 
>>
>> The feature to create a mirror repository for CI isn't
>> included in the Free tier on https://gitlab.com/ . But
>> https://gitlab.com/ provides the Gold tier features to open
>> source project:
>> https://about.gitlab.com/solutions/github/#open-source-projects
>> So we can use this feature.
>>
>>
>> Here are advantages I think to use GitLab CI:
>>
>>   * We can increase CI capacity by adding our new workers.
>>     * GitLab Runner (CI job runner) can work on GNU/Linux, macOS
>>       and Windows: https://docs.gitlab.com/runner/#requirements
>>       It means that we can increase CI capacity of all of them.
>>
> The same is true for buildbot, the workers are basically python twisted
> applications, so we can host them on any platform.
> 
>>
>>   * We can reduce CI time by caching built Docker images.
>>     * It will reduce package build job time especially.
>>
> As I mentioned above the same is true for buildbot, but the package
> build scripts needs to be ported to either crossbow or ursabot or the
> new gitlab CI.
> Mentioning crossbow, I've recently added support for azure pipelines
> and circleci. For the docker tests (which are running the docker-compose
> commands) we could add a gitlab specific template yml to test gitlab's
> docker capabilities, see the current templates at
> https://github.com/apache/arrow/tree/master/dev/tasks/docker-tests
> 
>>
>>   * We can run CUDA related tests in CI by adding CUDA
>>     enabled workers.
>>
> Here is a PR which passes `--runtime=nvidia` option to the docker run
> command, thus making CUDA enabled tests possible on buildbot's
> docker workers: https://github.com/ursa-labs/ursabot/pull/118
> 
>>
>>   * We can manage CI jobs in https://github.com/apache/arrow
>>     repository.
>>     * GitLab CI uses .gitlab-ci.yml like .travis.yml for
>>       Travis CI.
>>
> We can also maintain our buildbot configuration in apache/arrow
> similarly, but with a more flexible python based DSL.
> 
>>
>>
>> If we create a mirror repository for CI on
>> https://gitlab.com/ , https://gitlab.com/ursa-labs/arrow
>> will be a good URL.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <CA...@mail.gmail.com>
>>   "Re: [DISCUSS] Ongoing Travis CI service degradation" on Sat, 29 Jun
>> 2019 14:54:19 -0500,
>>   Wes McKinney <we...@gmail.com> wrote:
>>
>> > hi Rok,
>> >
>> > I would guess that GitHub Actions will have the same resource and
>> > hardware limitations that Travis CI and Appveyor currently have, as
>> > well as organization-level resource contention with the rest of the
>> > ASF.
>> >
>> > We need to have dedicated, powerful hardware (more cores, more RAM),
>> > with more capabilities (architectures other than x86, and with GPUs),
>> > that can run jobs longer than 50 minutes, with the ability to scale up
>> > as the project grows in # of contributions per month. In the past
>> > month Arrow had 4300 hours of builds on Travis CI. What will happen
>> > when we need 10,000 or more hours per month to verify all of our
>> > patches? At the current rapid rate of project growth it is only a
>> > matter of time.
>> >
>> > I made a graph of commits to master by month:
>> >
>> > https://imgur.com/a/02TtGXx
>> >
>> > With nearly ~300 commits in the month of June alone, it begs the
>> > question how to support 500 commits per month, or 1000.
>> >
>> > - Wes
>> >
>> >
>> >
>> > On Sat, Jun 29, 2019 at 5:19 AM Rok Mihevc <ro...@gmail.com> wrote:
>> >>
>> >> GitHub Actions are currently in limited public beta and appear to be
>> >> similar to GitLab CI: https://github.com/features/actions
>> >> More here: https://help.github.com/en/articles/about-github-actions
>> >>
>> >> Rok
>> >>
>> >> On Fri, Jun 28, 2019 at 7:06 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >>
>> >> > Based on the discussion in
>> >> > https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
>> >> > to be ASF Infra's inclination to allow projects to donate money to the
>> >> > Foundation to get more build resources on Travis CI. Our likely only
>> >> > solution is going to be to reduce our dependence on Travis CI. In the
>> >> > short term, I would say that the sooner we can migrate all of our
>> >> > Linux builds to docker-compose form to aid in this transition, the
>> >> > better
>> >> >
>> >> > We are hiring in our organization (Ursa Labs) for a dedicated role to
>> >> > support CI and development lifecycle automation (packaging,
>> >> > benchmarking, releases, etc.) in the Apache Arrow project, so I hope
>> >> > that we can provide even more help to resolve these issues in the
>> >> > future than we already are
>> >> >
>> >> > On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org>
>> >> > wrote:
>> >> > >
>> >> > >
>> >> > > Also note that the situation with AppVeyor isn't much better.
>> >> > >
>> >> > > Any "free as in beer" CI service is probably too capacity-limited
>> for
>> >> > > our needs now, unless it allows private workers (which apparently
>> Gitlab
>> >> > > CI does).
>> >> > >
>> >> > > Regards
>> >> > >
>> >> > > Antoine.
>> >> > >
>> >> > >
>> >> > > Le 26/06/2019 à 18:32, Wes McKinney a écrit :
>> >> > > > It seems that there is intermittent Apache-wide degradation of
>> Travis
>> >> > > > CI services -- I was looking at https://travis-ci.org/apache
>> today and
>> >> > > > there appeared to be a stretch of 3-4 hours where no queued
>> builds on
>> >> > > > github.com/apache were running at all. I initially thought that
>> the
>> >> > > > issue was contention with other Apache projects but even with
>> >> > > > round-robin allocation and a concurrency limit (e.g. no Apache
>> project
>> >> > > > having more than 5-6 concurrent builds) that wouldn't explain why
>> NO
>> >> > > > builds are running.
>> >> > > >
>> >> > > > This is obviously disturbing given how reliant we are on Travis
>> CI to
>> >> > > > validate patches to be merged.
>> >> > > >
>> >> > > > I've opened a support ticket with Travis CI to see if they can
>> provide
>> >> > > > some insight into what's going on. There is also an INFRA ticket
>> where
>> >> > > > other projects have reported some similar experiences
>> >> > > >
>> >> > > > https://issues.apache.org/jira/browse/INFRA-18533
>> >> > > >
>> >> > > > As a meta-comment, at some point Apache Arrow is going to need to
>> move
>> >> > > > off of public CI services for patch validation so that we can have
>> >> > > > unilateral control over scaling our build / test resources as the
>> >> > > > community grows larger. As the most active merger of patches (I
>> have
>> >> > > > merged over 50% of pull requests over the project's history) this
>> >> > > > affects me greatly as I am often monitoring builds on many open
>> PRs so
>> >> > > > that I can merge them as soon as possible. We are often resorting
>> to
>> >> > > > builds on contributor's forks (assuming they have enabled Travis
>> CI /
>> >> > > > Appveyor)
>> >> > > >
>> >> > > > As some context around Travis CI in particular, in January Travis
>> CI
>> >> > > > was acquired by Idera, a private equity (I think?) developer tools
>> >> > > > conglomerate. It's likely that we're seeing some "maximize profit,
>> >> > > > minimize costs" behavior in play, so the recent experience could
>> >> > > > become the new normal.
>> >> > > >
>> >> > > > - Wes
>> >> > > >
>> >> >
>>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Rok Mihevc <ro...@gmail.com>.
Hey,

I was thinking more about using GitHub actions in the same way as Kou is
proposing to use GitLab CI. So basically to orchestrate CI workload on our
workers. Sorry for not specifying.

GitLab is currently more mature but on the other hand we're already on
GitHub. We should probably evaluate both options if we go this way.


Rok

On Sun, Jun 30, 2019 at 12:03 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> Hi,
>
> How about creating a mirror repository on
> https://gitlab.com/ only to run CI jobs?
>
> This is an idea that is described in
> https://issues.apache.org/jira/browse/ARROW-5673 .
>
> GitLab CI can attach external workers. So we can increase CI
> capacity by adding our new workers. GitLab also provides
> Docker registry. It means that we can cache built Docker
> images for our CI. It will reduce CI time.
>
> The feature to create a mirror repository for CI isn't
> included in the Free tier on https://gitlab.com/ . But
> https://gitlab.com/ provides the Gold tier features to open
> source project:
> https://about.gitlab.com/solutions/github/#open-source-projects
> So we can use this feature.
>
>
> Here are advantages I think to use GitLab CI:
>
>   * We can increase CI capacity by adding our new workers.
>     * GitLab Runner (CI job runner) can work on GNU/Linux, macOS
>       and Windows: https://docs.gitlab.com/runner/#requirements
>       It means that we can increase CI capacity of all of them.
>
>   * We can reduce CI time by caching built Docker images.
>     * It will reduce package build job time especially.
>
>   * We can run CUDA related tests in CI by adding CUDA
>     enabled workers.
>
>   * We can manage CI jobs in https://github.com/apache/arrow
>     repository.
>     * GitLab CI uses .gitlab-ci.yml like .travis.yml for
>       Travis CI.
>
>
> If we create a mirror repository for CI on
> https://gitlab.com/ , https://gitlab.com/ursa-labs/arrow
> will be a good URL.
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: [DISCUSS] Ongoing Travis CI service degradation" on Sat, 29 Jun
> 2019 14:54:19 -0500,
>   Wes McKinney <we...@gmail.com> wrote:
>
> > hi Rok,
> >
> > I would guess that GitHub Actions will have the same resource and
> > hardware limitations that Travis CI and Appveyor currently have, as
> > well as organization-level resource contention with the rest of the
> > ASF.
> >
> > We need to have dedicated, powerful hardware (more cores, more RAM),
> > with more capabilities (architectures other than x86, and with GPUs),
> > that can run jobs longer than 50 minutes, with the ability to scale up
> > as the project grows in # of contributions per month. In the past
> > month Arrow had 4300 hours of builds on Travis CI. What will happen
> > when we need 10,000 or more hours per month to verify all of our
> > patches? At the current rapid rate of project growth it is only a
> > matter of time.
> >
> > I made a graph of commits to master by month:
> >
> > https://imgur.com/a/02TtGXx
> >
> > With nearly ~300 commits in the month of June alone, it begs the
> > question how to support 500 commits per month, or 1000.
> >
> > - Wes
> >
> >
> >
> > On Sat, Jun 29, 2019 at 5:19 AM Rok Mihevc <ro...@gmail.com> wrote:
> >>
> >> GitHub Actions are currently in limited public beta and appear to be
> >> similar to GitLab CI: https://github.com/features/actions
> >> More here: https://help.github.com/en/articles/about-github-actions
> >>
> >> Rok
> >>
> >> On Fri, Jun 28, 2019 at 7:06 PM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> > Based on the discussion in
> >> > https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
> >> > to be ASF Infra's inclination to allow projects to donate money to the
> >> > Foundation to get more build resources on Travis CI. Our likely only
> >> > solution is going to be to reduce our dependence on Travis CI. In the
> >> > short term, I would say that the sooner we can migrate all of our
> >> > Linux builds to docker-compose form to aid in this transition, the
> >> > better
> >> >
> >> > We are hiring in our organization (Ursa Labs) for a dedicated role to
> >> > support CI and development lifecycle automation (packaging,
> >> > benchmarking, releases, etc.) in the Apache Arrow project, so I hope
> >> > that we can provide even more help to resolve these issues in the
> >> > future than we already are
> >> >
> >> > On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org>
> >> > wrote:
> >> > >
> >> > >
> >> > > Also note that the situation with AppVeyor isn't much better.
> >> > >
> >> > > Any "free as in beer" CI service is probably too capacity-limited
> for
> >> > > our needs now, unless it allows private workers (which apparently
> Gitlab
> >> > > CI does).
> >> > >
> >> > > Regards
> >> > >
> >> > > Antoine.
> >> > >
> >> > >
> >> > > Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> >> > > > It seems that there is intermittent Apache-wide degradation of
> Travis
> >> > > > CI services -- I was looking at https://travis-ci.org/apache
> today and
> >> > > > there appeared to be a stretch of 3-4 hours where no queued
> builds on
> >> > > > github.com/apache were running at all. I initially thought that
> the
> >> > > > issue was contention with other Apache projects but even with
> >> > > > round-robin allocation and a concurrency limit (e.g. no Apache
> project
> >> > > > having more than 5-6 concurrent builds) that wouldn't explain why
> NO
> >> > > > builds are running.
> >> > > >
> >> > > > This is obviously disturbing given how reliant we are on Travis
> CI to
> >> > > > validate patches to be merged.
> >> > > >
> >> > > > I've opened a support ticket with Travis CI to see if they can
> provide
> >> > > > some insight into what's going on. There is also an INFRA ticket
> where
> >> > > > other projects have reported some similar experiences
> >> > > >
> >> > > > https://issues.apache.org/jira/browse/INFRA-18533
> >> > > >
> >> > > > As a meta-comment, at some point Apache Arrow is going to need to
> move
> >> > > > off of public CI services for patch validation so that we can have
> >> > > > unilateral control over scaling our build / test resources as the
> >> > > > community grows larger. As the most active merger of patches (I
> have
> >> > > > merged over 50% of pull requests over the project's history) this
> >> > > > affects me greatly as I am often monitoring builds on many open
> PRs so
> >> > > > that I can merge them as soon as possible. We are often resorting
> to
> >> > > > builds on contributor's forks (assuming they have enabled Travis
> CI /
> >> > > > Appveyor)
> >> > > >
> >> > > > As some context around Travis CI in particular, in January Travis
> CI
> >> > > > was acquired by Idera, a private equity (I think?) developer tools
> >> > > > conglomerate. It's likely that we're seeing some "maximize profit,
> >> > > > minimize costs" behavior in play, so the recent experience could
> >> > > > become the new normal.
> >> > > >
> >> > > > - Wes
> >> > > >
> >> >
>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Wes McKinney <we...@gmail.com>.
On Sun, Jun 30, 2019 at 3:03 PM Krisztián Szűcs
<sz...@gmail.com> wrote:
>
> On Sun, Jun 30, 2019 at 9:12 PM Wes McKinney <we...@gmail.com> wrote:
>
> > I've justed created a parent JIRA for Docker-ifying all of the Linux
> > builds in Travis CI
> >
> > https://issues.apache.org/jira/browse/ARROW-5801
> >
> > I did Java here since it was one of the easier ones
> >
> > https://github.com/apache/arrow/pull/4761
>
> I've written most of the Dockerfiles in the arrow repository and
> writing hierarchical images (without native docker support for it)
> and maintaining them is really painful, error prone and hardly testable.
>
> So I've started to write a small tool in order to overcome these
> limitations, see https://github.com/ursa-labs/ursabot#define-docker-images
>

This tool looks cool. What do you think about contributing this tool
to Apache Arrow so that we can use it to generate some of our
Dockerfiles?

> >
> >
> > Expensive Docker images can be pushed to @ursalab on Docker Hub
> > (https://cloud.docker.com/u/ursalab/repository/list) -- I will be
> > happy to give any Arrow committer access to this Docker Hub
> > organization to help maintain the images in the short term.
> >
> These docker images are built and pushed automatically by the tool
> described above. Here you can find the DSL used for those
> images:
> https://github.com/ursa-labs/ursabot/blob/master/ursabot/docker.py#L372-L540
>
> >
> > Some of the others will require more work.
> >
> > We should think about how to refactor the build scripts for macOS in a
> > way that is decoupled from Travis CI environment variables and custom
> > build image details so they can more easily be run on arbitrary macOS
> > build workers.
> >
> > On Sun, Jun 30, 2019 at 11:26 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > GitLab is currently more mature but on the other hand we're already on
> > > > GitHub. We should probably evaluate both options if we go this way.
> > >
> > > We have to keep the code repository on GitHub because all Apache
> > > projects are on GitHub now. How projects manage patches and CI is up
> > > to each project, though. Other projects I'm familiar with like Apache
> > > Kudu and Apache Impala use Gerrit and Jenkins for their code review
> > > and CI, respectively.
> > >
> > > If we can use GitLab CI and get it to make status reports into our PR
> > > queue on GitHub, that would be nice. Buildbot is another option
> > > (though they are not mutually exclusive). My colleagues and I plan go
> > > to continue investing in our Buildbot infrastructure though not
> > > necessarily to the exclusion of other things.
> > >
> > > Perhaps we can set up a proof of concept on GitLab to see what things
> > > look like. I've set up a repository mirror at
> > >
> > > https://gitlab.com/ursa-labs/arrow
> > >
> > > where we can experiment
> > >
> > > On Sat, Jun 29, 2019 at 10:02 PM Jed Brown <je...@jedbrown.org> wrote:
> > > >
> > > > Sutou Kouhei <ko...@clear-code.com> writes:
> > > >
> > > > > How about creating a mirror repository on
> > > > > https://gitlab.com/ only to run CI jobs?
> > > > >
> > > > > This is an idea that is described in
> > > > > https://issues.apache.org/jira/browse/ARROW-5673 .
> > > > >
> > > > > GitLab CI can attach external workers. So we can increase CI
> > > > > capacity by adding our new workers. GitLab also provides
> > > > > Docker registry. It means that we can cache built Docker
> > > > > images for our CI. It will reduce CI time.
> > > >
> > > > I have some experience with GitLab-CI.  The gitlab-runner is great and
> > > > easy to deploy.  GitLab-CI integrates structured artifacts (like which
> > > > tests pass and their output [1] and changes to continuous metrics like
> > > > performance [2]) very nicely into GitLab Merge Requests, but when you
> > > > connect to an external repository (GitHub, etc.), it only reports
> > > > pass/fail and you can't access the structured artifacts [3], only the
> > > > console logs and compressed archives of artifacts if you use that
> > > > feature.
> > > >
> > > > If you're happy with clicking through to console logs in case a
> > pipeline
> > > > fails (the Travis model), then GitLab-CI is easy to use and will serve
> > > > your purposes.  If you really want the structured features, then I'd
> > > > encourage you to mention that in [3].
> > > >
> > > > [1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
> > > > [2]
> > https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
> > > > [3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158
> >

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Krisztián Szűcs <sz...@gmail.com>.
On Sun, Jun 30, 2019 at 9:12 PM Wes McKinney <we...@gmail.com> wrote:

> I've justed created a parent JIRA for Docker-ifying all of the Linux
> builds in Travis CI
>
> https://issues.apache.org/jira/browse/ARROW-5801
>
> I did Java here since it was one of the easier ones
>
> https://github.com/apache/arrow/pull/4761

I've written most of the Dockerfiles in the arrow repository and
writing hierarchical images (without native docker support for it)
and maintaining them is really painful, error prone and hardly testable.

So I've started to write a small tool in order to overcome these
limitations, see https://github.com/ursa-labs/ursabot#define-docker-images

>
>
> Expensive Docker images can be pushed to @ursalab on Docker Hub
> (https://cloud.docker.com/u/ursalab/repository/list) -- I will be
> happy to give any Arrow committer access to this Docker Hub
> organization to help maintain the images in the short term.
>
These docker images are built and pushed automatically by the tool
described above. Here you can find the DSL used for those
images:
https://github.com/ursa-labs/ursabot/blob/master/ursabot/docker.py#L372-L540

>
> Some of the others will require more work.
>
> We should think about how to refactor the build scripts for macOS in a
> way that is decoupled from Travis CI environment variables and custom
> build image details so they can more easily be run on arbitrary macOS
> build workers.
>
> On Sun, Jun 30, 2019 at 11:26 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > > GitLab is currently more mature but on the other hand we're already on
> > > GitHub. We should probably evaluate both options if we go this way.
> >
> > We have to keep the code repository on GitHub because all Apache
> > projects are on GitHub now. How projects manage patches and CI is up
> > to each project, though. Other projects I'm familiar with like Apache
> > Kudu and Apache Impala use Gerrit and Jenkins for their code review
> > and CI, respectively.
> >
> > If we can use GitLab CI and get it to make status reports into our PR
> > queue on GitHub, that would be nice. Buildbot is another option
> > (though they are not mutually exclusive). My colleagues and I plan go
> > to continue investing in our Buildbot infrastructure though not
> > necessarily to the exclusion of other things.
> >
> > Perhaps we can set up a proof of concept on GitLab to see what things
> > look like. I've set up a repository mirror at
> >
> > https://gitlab.com/ursa-labs/arrow
> >
> > where we can experiment
> >
> > On Sat, Jun 29, 2019 at 10:02 PM Jed Brown <je...@jedbrown.org> wrote:
> > >
> > > Sutou Kouhei <ko...@clear-code.com> writes:
> > >
> > > > How about creating a mirror repository on
> > > > https://gitlab.com/ only to run CI jobs?
> > > >
> > > > This is an idea that is described in
> > > > https://issues.apache.org/jira/browse/ARROW-5673 .
> > > >
> > > > GitLab CI can attach external workers. So we can increase CI
> > > > capacity by adding our new workers. GitLab also provides
> > > > Docker registry. It means that we can cache built Docker
> > > > images for our CI. It will reduce CI time.
> > >
> > > I have some experience with GitLab-CI.  The gitlab-runner is great and
> > > easy to deploy.  GitLab-CI integrates structured artifacts (like which
> > > tests pass and their output [1] and changes to continuous metrics like
> > > performance [2]) very nicely into GitLab Merge Requests, but when you
> > > connect to an external repository (GitHub, etc.), it only reports
> > > pass/fail and you can't access the structured artifacts [3], only the
> > > console logs and compressed archives of artifacts if you use that
> > > feature.
> > >
> > > If you're happy with clicking through to console logs in case a
> pipeline
> > > fails (the Travis model), then GitLab-CI is easy to use and will serve
> > > your purposes.  If you really want the structured features, then I'd
> > > encourage you to mention that in [3].
> > >
> > > [1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
> > > [2]
> https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
> > > [3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158
>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Wes McKinney <we...@gmail.com>.
I've justed created a parent JIRA for Docker-ifying all of the Linux
builds in Travis CI

https://issues.apache.org/jira/browse/ARROW-5801

I did Java here since it was one of the easier ones

https://github.com/apache/arrow/pull/4761

Expensive Docker images can be pushed to @ursalab on Docker Hub
(https://cloud.docker.com/u/ursalab/repository/list) -- I will be
happy to give any Arrow committer access to this Docker Hub
organization to help maintain the images in the short term.

Some of the others will require more work.

We should think about how to refactor the build scripts for macOS in a
way that is decoupled from Travis CI environment variables and custom
build image details so they can more easily be run on arbitrary macOS
build workers.

On Sun, Jun 30, 2019 at 11:26 AM Wes McKinney <we...@gmail.com> wrote:
>
> > GitLab is currently more mature but on the other hand we're already on
> > GitHub. We should probably evaluate both options if we go this way.
>
> We have to keep the code repository on GitHub because all Apache
> projects are on GitHub now. How projects manage patches and CI is up
> to each project, though. Other projects I'm familiar with like Apache
> Kudu and Apache Impala use Gerrit and Jenkins for their code review
> and CI, respectively.
>
> If we can use GitLab CI and get it to make status reports into our PR
> queue on GitHub, that would be nice. Buildbot is another option
> (though they are not mutually exclusive). My colleagues and I plan go
> to continue investing in our Buildbot infrastructure though not
> necessarily to the exclusion of other things.
>
> Perhaps we can set up a proof of concept on GitLab to see what things
> look like. I've set up a repository mirror at
>
> https://gitlab.com/ursa-labs/arrow
>
> where we can experiment
>
> On Sat, Jun 29, 2019 at 10:02 PM Jed Brown <je...@jedbrown.org> wrote:
> >
> > Sutou Kouhei <ko...@clear-code.com> writes:
> >
> > > How about creating a mirror repository on
> > > https://gitlab.com/ only to run CI jobs?
> > >
> > > This is an idea that is described in
> > > https://issues.apache.org/jira/browse/ARROW-5673 .
> > >
> > > GitLab CI can attach external workers. So we can increase CI
> > > capacity by adding our new workers. GitLab also provides
> > > Docker registry. It means that we can cache built Docker
> > > images for our CI. It will reduce CI time.
> >
> > I have some experience with GitLab-CI.  The gitlab-runner is great and
> > easy to deploy.  GitLab-CI integrates structured artifacts (like which
> > tests pass and their output [1] and changes to continuous metrics like
> > performance [2]) very nicely into GitLab Merge Requests, but when you
> > connect to an external repository (GitHub, etc.), it only reports
> > pass/fail and you can't access the structured artifacts [3], only the
> > console logs and compressed archives of artifacts if you use that
> > feature.
> >
> > If you're happy with clicking through to console logs in case a pipeline
> > fails (the Travis model), then GitLab-CI is easy to use and will serve
> > your purposes.  If you really want the structured features, then I'd
> > encourage you to mention that in [3].
> >
> > [1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
> > [2] https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
> > [3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Krisztián Szűcs <sz...@gmail.com>.
On Sun, Jun 30, 2019 at 6:27 PM Wes McKinney <we...@gmail.com> wrote:

> > GitLab is currently more mature but on the other hand we're already on
> > GitHub. We should probably evaluate both options if we go this way.
>
> We have to keep the code repository on GitHub because all Apache
> projects are on GitHub now. How projects manage patches and CI is up
> to each project, though. Other projects I'm familiar with like Apache
> Kudu and Apache Impala use Gerrit and Jenkins for their code review
> and CI, respectively.
>
> If we can use GitLab CI and get it to make status reports into our PR
> queue on GitHub, that would be nice. Buildbot is another option
> (though they are not mutually exclusive). My colleagues and I plan go
> to continue investing in our Buildbot infrastructure though not
> necessarily to the exclusion of other things.
>
I'm not against GitLab, but let me mention, that with buildbot we can also
DEVELOP our own continuous integration system, not just CONFIGURE it.

More esoteric features like a comment bot still requires custom solutions,
and buildbot provides a great framework to achieve it in a maintainable
fashion.

>
> Perhaps we can set up a proof of concept on GitLab to see what things
> look like. I've set up a repository mirror at
>
> https://gitlab.com/ursa-labs/arrow
>
> where we can experiment
>
> On Sat, Jun 29, 2019 at 10:02 PM Jed Brown <je...@jedbrown.org> wrote:
> >
> > Sutou Kouhei <ko...@clear-code.com> writes:
> >
> > > How about creating a mirror repository on
> > > https://gitlab.com/ only to run CI jobs?
> > >
> > > This is an idea that is described in
> > > https://issues.apache.org/jira/browse/ARROW-5673 .
> > >
> > > GitLab CI can attach external workers. So we can increase CI
> > > capacity by adding our new workers. GitLab also provides
> > > Docker registry. It means that we can cache built Docker
> > > images for our CI. It will reduce CI time.
> >
> > I have some experience with GitLab-CI.  The gitlab-runner is great and
> > easy to deploy.  GitLab-CI integrates structured artifacts (like which
> > tests pass and their output [1] and changes to continuous metrics like
> > performance [2]) very nicely into GitLab Merge Requests, but when you
> > connect to an external repository (GitHub, etc.), it only reports
> > pass/fail and you can't access the structured artifacts [3], only the
> > console logs and compressed archives of artifacts if you use that
> > feature.
> >
> > If you're happy with clicking through to console logs in case a pipeline
> > fails (the Travis model), then GitLab-CI is easy to use and will serve
> > your purposes.  If you really want the structured features, then I'd
> > encourage you to mention that in [3].
> >
> > [1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
> > [2]
> https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
> > [3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158
>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Wes McKinney <we...@gmail.com>.
> GitLab is currently more mature but on the other hand we're already on
> GitHub. We should probably evaluate both options if we go this way.

We have to keep the code repository on GitHub because all Apache
projects are on GitHub now. How projects manage patches and CI is up
to each project, though. Other projects I'm familiar with like Apache
Kudu and Apache Impala use Gerrit and Jenkins for their code review
and CI, respectively.

If we can use GitLab CI and get it to make status reports into our PR
queue on GitHub, that would be nice. Buildbot is another option
(though they are not mutually exclusive). My colleagues and I plan go
to continue investing in our Buildbot infrastructure though not
necessarily to the exclusion of other things.

Perhaps we can set up a proof of concept on GitLab to see what things
look like. I've set up a repository mirror at

https://gitlab.com/ursa-labs/arrow

where we can experiment

On Sat, Jun 29, 2019 at 10:02 PM Jed Brown <je...@jedbrown.org> wrote:
>
> Sutou Kouhei <ko...@clear-code.com> writes:
>
> > How about creating a mirror repository on
> > https://gitlab.com/ only to run CI jobs?
> >
> > This is an idea that is described in
> > https://issues.apache.org/jira/browse/ARROW-5673 .
> >
> > GitLab CI can attach external workers. So we can increase CI
> > capacity by adding our new workers. GitLab also provides
> > Docker registry. It means that we can cache built Docker
> > images for our CI. It will reduce CI time.
>
> I have some experience with GitLab-CI.  The gitlab-runner is great and
> easy to deploy.  GitLab-CI integrates structured artifacts (like which
> tests pass and their output [1] and changes to continuous metrics like
> performance [2]) very nicely into GitLab Merge Requests, but when you
> connect to an external repository (GitHub, etc.), it only reports
> pass/fail and you can't access the structured artifacts [3], only the
> console logs and compressed archives of artifacts if you use that
> feature.
>
> If you're happy with clicking through to console logs in case a pipeline
> fails (the Travis model), then GitLab-CI is easy to use and will serve
> your purposes.  If you really want the structured features, then I'd
> encourage you to mention that in [3].
>
> [1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
> [2] https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
> [3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Jed Brown <je...@jedbrown.org>.
Sutou Kouhei <ko...@clear-code.com> writes:

> How about creating a mirror repository on
> https://gitlab.com/ only to run CI jobs?
>
> This is an idea that is described in
> https://issues.apache.org/jira/browse/ARROW-5673 .
>
> GitLab CI can attach external workers. So we can increase CI
> capacity by adding our new workers. GitLab also provides
> Docker registry. It means that we can cache built Docker
> images for our CI. It will reduce CI time.

I have some experience with GitLab-CI.  The gitlab-runner is great and
easy to deploy.  GitLab-CI integrates structured artifacts (like which
tests pass and their output [1] and changes to continuous metrics like
performance [2]) very nicely into GitLab Merge Requests, but when you
connect to an external repository (GitHub, etc.), it only reports
pass/fail and you can't access the structured artifacts [3], only the
console logs and compressed archives of artifacts if you use that
feature.

If you're happy with clicking through to console logs in case a pipeline
fails (the Travis model), then GitLab-CI is easy to use and will serve
your purposes.  If you really want the structured features, then I'd
encourage you to mention that in [3].

[1] https://docs.gitlab.com/ee/ci/junit_test_reports.html#how-it-works
[2] https://docs.gitlab.com/ee/user/project/merge_requests/browser_performance_testing.html#how-it-works
[3] https://gitlab.com/gitlab-org/gitlab-ce/issues/60158

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Krisztián Szűcs <sz...@gmail.com>.
On Sun, Jun 30, 2019 at 12:03 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> Hi,
>
> How about creating a mirror repository on
> https://gitlab.com/ only to run CI jobs?
>
> This is an idea that is described in
> https://issues.apache.org/jira/browse/ARROW-5673 .
>
I do agree that We should investigate the features provided by
GitLab. Buildbot might not be so familiar to others, so I'm trying
to provide some details to see how it compares to GitLab CI.

>
> GitLab CI can attach external workers. So we can increase CI
> capacity by adding our new workers. GitLab also provides
> Docker registry. It means that we can cache built Docker
> images for our CI. It will reduce CI time.
>
Currently all of the workers are running within docker daemons,
so all of the images are cached once they are pulled to a docker
daemon. The `worker_preparation` takes only 3 seconds:
https://ci.ursalabs.org/#/builders/66/builds/2157

>
> The feature to create a mirror repository for CI isn't
> included in the Free tier on https://gitlab.com/ . But
> https://gitlab.com/ provides the Gold tier features to open
> source project:
> https://about.gitlab.com/solutions/github/#open-source-projects
> So we can use this feature.
>
>
> Here are advantages I think to use GitLab CI:
>
>   * We can increase CI capacity by adding our new workers.
>     * GitLab Runner (CI job runner) can work on GNU/Linux, macOS
>       and Windows: https://docs.gitlab.com/runner/#requirements
>       It means that we can increase CI capacity of all of them.
>
The same is true for buildbot, the workers are basically python twisted
applications, so we can host them on any platform.

>
>   * We can reduce CI time by caching built Docker images.
>     * It will reduce package build job time especially.
>
As I mentioned above the same is true for buildbot, but the package
build scripts needs to be ported to either crossbow or ursabot or the
new gitlab CI.
Mentioning crossbow, I've recently added support for azure pipelines
and circleci. For the docker tests (which are running the docker-compose
commands) we could add a gitlab specific template yml to test gitlab's
docker capabilities, see the current templates at
https://github.com/apache/arrow/tree/master/dev/tasks/docker-tests

>
>   * We can run CUDA related tests in CI by adding CUDA
>     enabled workers.
>
Here is a PR which passes `--runtime=nvidia` option to the docker run
command, thus making CUDA enabled tests possible on buildbot's
docker workers: https://github.com/ursa-labs/ursabot/pull/118

>
>   * We can manage CI jobs in https://github.com/apache/arrow
>     repository.
>     * GitLab CI uses .gitlab-ci.yml like .travis.yml for
>       Travis CI.
>
We can also maintain our buildbot configuration in apache/arrow
similarly, but with a more flexible python based DSL.

>
>
> If we create a mirror repository for CI on
> https://gitlab.com/ , https://gitlab.com/ursa-labs/arrow
> will be a good URL.
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "Re: [DISCUSS] Ongoing Travis CI service degradation" on Sat, 29 Jun
> 2019 14:54:19 -0500,
>   Wes McKinney <we...@gmail.com> wrote:
>
> > hi Rok,
> >
> > I would guess that GitHub Actions will have the same resource and
> > hardware limitations that Travis CI and Appveyor currently have, as
> > well as organization-level resource contention with the rest of the
> > ASF.
> >
> > We need to have dedicated, powerful hardware (more cores, more RAM),
> > with more capabilities (architectures other than x86, and with GPUs),
> > that can run jobs longer than 50 minutes, with the ability to scale up
> > as the project grows in # of contributions per month. In the past
> > month Arrow had 4300 hours of builds on Travis CI. What will happen
> > when we need 10,000 or more hours per month to verify all of our
> > patches? At the current rapid rate of project growth it is only a
> > matter of time.
> >
> > I made a graph of commits to master by month:
> >
> > https://imgur.com/a/02TtGXx
> >
> > With nearly ~300 commits in the month of June alone, it begs the
> > question how to support 500 commits per month, or 1000.
> >
> > - Wes
> >
> >
> >
> > On Sat, Jun 29, 2019 at 5:19 AM Rok Mihevc <ro...@gmail.com> wrote:
> >>
> >> GitHub Actions are currently in limited public beta and appear to be
> >> similar to GitLab CI: https://github.com/features/actions
> >> More here: https://help.github.com/en/articles/about-github-actions
> >>
> >> Rok
> >>
> >> On Fri, Jun 28, 2019 at 7:06 PM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> > Based on the discussion in
> >> > https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
> >> > to be ASF Infra's inclination to allow projects to donate money to the
> >> > Foundation to get more build resources on Travis CI. Our likely only
> >> > solution is going to be to reduce our dependence on Travis CI. In the
> >> > short term, I would say that the sooner we can migrate all of our
> >> > Linux builds to docker-compose form to aid in this transition, the
> >> > better
> >> >
> >> > We are hiring in our organization (Ursa Labs) for a dedicated role to
> >> > support CI and development lifecycle automation (packaging,
> >> > benchmarking, releases, etc.) in the Apache Arrow project, so I hope
> >> > that we can provide even more help to resolve these issues in the
> >> > future than we already are
> >> >
> >> > On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org>
> >> > wrote:
> >> > >
> >> > >
> >> > > Also note that the situation with AppVeyor isn't much better.
> >> > >
> >> > > Any "free as in beer" CI service is probably too capacity-limited
> for
> >> > > our needs now, unless it allows private workers (which apparently
> Gitlab
> >> > > CI does).
> >> > >
> >> > > Regards
> >> > >
> >> > > Antoine.
> >> > >
> >> > >
> >> > > Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> >> > > > It seems that there is intermittent Apache-wide degradation of
> Travis
> >> > > > CI services -- I was looking at https://travis-ci.org/apache
> today and
> >> > > > there appeared to be a stretch of 3-4 hours where no queued
> builds on
> >> > > > github.com/apache were running at all. I initially thought that
> the
> >> > > > issue was contention with other Apache projects but even with
> >> > > > round-robin allocation and a concurrency limit (e.g. no Apache
> project
> >> > > > having more than 5-6 concurrent builds) that wouldn't explain why
> NO
> >> > > > builds are running.
> >> > > >
> >> > > > This is obviously disturbing given how reliant we are on Travis
> CI to
> >> > > > validate patches to be merged.
> >> > > >
> >> > > > I've opened a support ticket with Travis CI to see if they can
> provide
> >> > > > some insight into what's going on. There is also an INFRA ticket
> where
> >> > > > other projects have reported some similar experiences
> >> > > >
> >> > > > https://issues.apache.org/jira/browse/INFRA-18533
> >> > > >
> >> > > > As a meta-comment, at some point Apache Arrow is going to need to
> move
> >> > > > off of public CI services for patch validation so that we can have
> >> > > > unilateral control over scaling our build / test resources as the
> >> > > > community grows larger. As the most active merger of patches (I
> have
> >> > > > merged over 50% of pull requests over the project's history) this
> >> > > > affects me greatly as I am often monitoring builds on many open
> PRs so
> >> > > > that I can merge them as soon as possible. We are often resorting
> to
> >> > > > builds on contributor's forks (assuming they have enabled Travis
> CI /
> >> > > > Appveyor)
> >> > > >
> >> > > > As some context around Travis CI in particular, in January Travis
> CI
> >> > > > was acquired by Idera, a private equity (I think?) developer tools
> >> > > > conglomerate. It's likely that we're seeing some "maximize profit,
> >> > > > minimize costs" behavior in play, so the recent experience could
> >> > > > become the new normal.
> >> > > >
> >> > > > - Wes
> >> > > >
> >> >
>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,

How about creating a mirror repository on
https://gitlab.com/ only to run CI jobs?

This is an idea that is described in
https://issues.apache.org/jira/browse/ARROW-5673 .

GitLab CI can attach external workers. So we can increase CI
capacity by adding our new workers. GitLab also provides
Docker registry. It means that we can cache built Docker
images for our CI. It will reduce CI time.

The feature to create a mirror repository for CI isn't
included in the Free tier on https://gitlab.com/ . But
https://gitlab.com/ provides the Gold tier features to open
source project:
https://about.gitlab.com/solutions/github/#open-source-projects
So we can use this feature.


Here are advantages I think to use GitLab CI:

  * We can increase CI capacity by adding our new workers.
    * GitLab Runner (CI job runner) can work on GNU/Linux, macOS
      and Windows: https://docs.gitlab.com/runner/#requirements
      It means that we can increase CI capacity of all of them.

  * We can reduce CI time by caching built Docker images.
    * It will reduce package build job time especially.

  * We can run CUDA related tests in CI by adding CUDA
    enabled workers.

  * We can manage CI jobs in https://github.com/apache/arrow
    repository.
    * GitLab CI uses .gitlab-ci.yml like .travis.yml for
      Travis CI.


If we create a mirror repository for CI on
https://gitlab.com/ , https://gitlab.com/ursa-labs/arrow
will be a good URL.


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "Re: [DISCUSS] Ongoing Travis CI service degradation" on Sat, 29 Jun 2019 14:54:19 -0500,
  Wes McKinney <we...@gmail.com> wrote:

> hi Rok,
> 
> I would guess that GitHub Actions will have the same resource and
> hardware limitations that Travis CI and Appveyor currently have, as
> well as organization-level resource contention with the rest of the
> ASF.
> 
> We need to have dedicated, powerful hardware (more cores, more RAM),
> with more capabilities (architectures other than x86, and with GPUs),
> that can run jobs longer than 50 minutes, with the ability to scale up
> as the project grows in # of contributions per month. In the past
> month Arrow had 4300 hours of builds on Travis CI. What will happen
> when we need 10,000 or more hours per month to verify all of our
> patches? At the current rapid rate of project growth it is only a
> matter of time.
> 
> I made a graph of commits to master by month:
> 
> https://imgur.com/a/02TtGXx
> 
> With nearly ~300 commits in the month of June alone, it begs the
> question how to support 500 commits per month, or 1000.
> 
> - Wes
> 
> 
> 
> On Sat, Jun 29, 2019 at 5:19 AM Rok Mihevc <ro...@gmail.com> wrote:
>>
>> GitHub Actions are currently in limited public beta and appear to be
>> similar to GitLab CI: https://github.com/features/actions
>> More here: https://help.github.com/en/articles/about-github-actions
>>
>> Rok
>>
>> On Fri, Jun 28, 2019 at 7:06 PM Wes McKinney <we...@gmail.com> wrote:
>>
>> > Based on the discussion in
>> > https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
>> > to be ASF Infra's inclination to allow projects to donate money to the
>> > Foundation to get more build resources on Travis CI. Our likely only
>> > solution is going to be to reduce our dependence on Travis CI. In the
>> > short term, I would say that the sooner we can migrate all of our
>> > Linux builds to docker-compose form to aid in this transition, the
>> > better
>> >
>> > We are hiring in our organization (Ursa Labs) for a dedicated role to
>> > support CI and development lifecycle automation (packaging,
>> > benchmarking, releases, etc.) in the Apache Arrow project, so I hope
>> > that we can provide even more help to resolve these issues in the
>> > future than we already are
>> >
>> > On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org>
>> > wrote:
>> > >
>> > >
>> > > Also note that the situation with AppVeyor isn't much better.
>> > >
>> > > Any "free as in beer" CI service is probably too capacity-limited for
>> > > our needs now, unless it allows private workers (which apparently Gitlab
>> > > CI does).
>> > >
>> > > Regards
>> > >
>> > > Antoine.
>> > >
>> > >
>> > > Le 26/06/2019 à 18:32, Wes McKinney a écrit :
>> > > > It seems that there is intermittent Apache-wide degradation of Travis
>> > > > CI services -- I was looking at https://travis-ci.org/apache today and
>> > > > there appeared to be a stretch of 3-4 hours where no queued builds on
>> > > > github.com/apache were running at all. I initially thought that the
>> > > > issue was contention with other Apache projects but even with
>> > > > round-robin allocation and a concurrency limit (e.g. no Apache project
>> > > > having more than 5-6 concurrent builds) that wouldn't explain why NO
>> > > > builds are running.
>> > > >
>> > > > This is obviously disturbing given how reliant we are on Travis CI to
>> > > > validate patches to be merged.
>> > > >
>> > > > I've opened a support ticket with Travis CI to see if they can provide
>> > > > some insight into what's going on. There is also an INFRA ticket where
>> > > > other projects have reported some similar experiences
>> > > >
>> > > > https://issues.apache.org/jira/browse/INFRA-18533
>> > > >
>> > > > As a meta-comment, at some point Apache Arrow is going to need to move
>> > > > off of public CI services for patch validation so that we can have
>> > > > unilateral control over scaling our build / test resources as the
>> > > > community grows larger. As the most active merger of patches (I have
>> > > > merged over 50% of pull requests over the project's history) this
>> > > > affects me greatly as I am often monitoring builds on many open PRs so
>> > > > that I can merge them as soon as possible. We are often resorting to
>> > > > builds on contributor's forks (assuming they have enabled Travis CI /
>> > > > Appveyor)
>> > > >
>> > > > As some context around Travis CI in particular, in January Travis CI
>> > > > was acquired by Idera, a private equity (I think?) developer tools
>> > > > conglomerate. It's likely that we're seeing some "maximize profit,
>> > > > minimize costs" behavior in play, so the recent experience could
>> > > > become the new normal.
>> > > >
>> > > > - Wes
>> > > >
>> >

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Wes McKinney <we...@gmail.com>.
hi Rok,

I would guess that GitHub Actions will have the same resource and
hardware limitations that Travis CI and Appveyor currently have, as
well as organization-level resource contention with the rest of the
ASF.

We need to have dedicated, powerful hardware (more cores, more RAM),
with more capabilities (architectures other than x86, and with GPUs),
that can run jobs longer than 50 minutes, with the ability to scale up
as the project grows in # of contributions per month. In the past
month Arrow had 4300 hours of builds on Travis CI. What will happen
when we need 10,000 or more hours per month to verify all of our
patches? At the current rapid rate of project growth it is only a
matter of time.

I made a graph of commits to master by month:

https://imgur.com/a/02TtGXx

With nearly ~300 commits in the month of June alone, it begs the
question how to support 500 commits per month, or 1000.

- Wes



On Sat, Jun 29, 2019 at 5:19 AM Rok Mihevc <ro...@gmail.com> wrote:
>
> GitHub Actions are currently in limited public beta and appear to be
> similar to GitLab CI: https://github.com/features/actions
> More here: https://help.github.com/en/articles/about-github-actions
>
> Rok
>
> On Fri, Jun 28, 2019 at 7:06 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Based on the discussion in
> > https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
> > to be ASF Infra's inclination to allow projects to donate money to the
> > Foundation to get more build resources on Travis CI. Our likely only
> > solution is going to be to reduce our dependence on Travis CI. In the
> > short term, I would say that the sooner we can migrate all of our
> > Linux builds to docker-compose form to aid in this transition, the
> > better
> >
> > We are hiring in our organization (Ursa Labs) for a dedicated role to
> > support CI and development lifecycle automation (packaging,
> > benchmarking, releases, etc.) in the Apache Arrow project, so I hope
> > that we can provide even more help to resolve these issues in the
> > future than we already are
> >
> > On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org>
> > wrote:
> > >
> > >
> > > Also note that the situation with AppVeyor isn't much better.
> > >
> > > Any "free as in beer" CI service is probably too capacity-limited for
> > > our needs now, unless it allows private workers (which apparently Gitlab
> > > CI does).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> > > > It seems that there is intermittent Apache-wide degradation of Travis
> > > > CI services -- I was looking at https://travis-ci.org/apache today and
> > > > there appeared to be a stretch of 3-4 hours where no queued builds on
> > > > github.com/apache were running at all. I initially thought that the
> > > > issue was contention with other Apache projects but even with
> > > > round-robin allocation and a concurrency limit (e.g. no Apache project
> > > > having more than 5-6 concurrent builds) that wouldn't explain why NO
> > > > builds are running.
> > > >
> > > > This is obviously disturbing given how reliant we are on Travis CI to
> > > > validate patches to be merged.
> > > >
> > > > I've opened a support ticket with Travis CI to see if they can provide
> > > > some insight into what's going on. There is also an INFRA ticket where
> > > > other projects have reported some similar experiences
> > > >
> > > > https://issues.apache.org/jira/browse/INFRA-18533
> > > >
> > > > As a meta-comment, at some point Apache Arrow is going to need to move
> > > > off of public CI services for patch validation so that we can have
> > > > unilateral control over scaling our build / test resources as the
> > > > community grows larger. As the most active merger of patches (I have
> > > > merged over 50% of pull requests over the project's history) this
> > > > affects me greatly as I am often monitoring builds on many open PRs so
> > > > that I can merge them as soon as possible. We are often resorting to
> > > > builds on contributor's forks (assuming they have enabled Travis CI /
> > > > Appveyor)
> > > >
> > > > As some context around Travis CI in particular, in January Travis CI
> > > > was acquired by Idera, a private equity (I think?) developer tools
> > > > conglomerate. It's likely that we're seeing some "maximize profit,
> > > > minimize costs" behavior in play, so the recent experience could
> > > > become the new normal.
> > > >
> > > > - Wes
> > > >
> >

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Rok Mihevc <ro...@gmail.com>.
GitHub Actions are currently in limited public beta and appear to be
similar to GitLab CI: https://github.com/features/actions
More here: https://help.github.com/en/articles/about-github-actions

Rok

On Fri, Jun 28, 2019 at 7:06 PM Wes McKinney <we...@gmail.com> wrote:

> Based on the discussion in
> https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
> to be ASF Infra's inclination to allow projects to donate money to the
> Foundation to get more build resources on Travis CI. Our likely only
> solution is going to be to reduce our dependence on Travis CI. In the
> short term, I would say that the sooner we can migrate all of our
> Linux builds to docker-compose form to aid in this transition, the
> better
>
> We are hiring in our organization (Ursa Labs) for a dedicated role to
> support CI and development lifecycle automation (packaging,
> benchmarking, releases, etc.) in the Apache Arrow project, so I hope
> that we can provide even more help to resolve these issues in the
> future than we already are
>
> On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org>
> wrote:
> >
> >
> > Also note that the situation with AppVeyor isn't much better.
> >
> > Any "free as in beer" CI service is probably too capacity-limited for
> > our needs now, unless it allows private workers (which apparently Gitlab
> > CI does).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> > > It seems that there is intermittent Apache-wide degradation of Travis
> > > CI services -- I was looking at https://travis-ci.org/apache today and
> > > there appeared to be a stretch of 3-4 hours where no queued builds on
> > > github.com/apache were running at all. I initially thought that the
> > > issue was contention with other Apache projects but even with
> > > round-robin allocation and a concurrency limit (e.g. no Apache project
> > > having more than 5-6 concurrent builds) that wouldn't explain why NO
> > > builds are running.
> > >
> > > This is obviously disturbing given how reliant we are on Travis CI to
> > > validate patches to be merged.
> > >
> > > I've opened a support ticket with Travis CI to see if they can provide
> > > some insight into what's going on. There is also an INFRA ticket where
> > > other projects have reported some similar experiences
> > >
> > > https://issues.apache.org/jira/browse/INFRA-18533
> > >
> > > As a meta-comment, at some point Apache Arrow is going to need to move
> > > off of public CI services for patch validation so that we can have
> > > unilateral control over scaling our build / test resources as the
> > > community grows larger. As the most active merger of patches (I have
> > > merged over 50% of pull requests over the project's history) this
> > > affects me greatly as I am often monitoring builds on many open PRs so
> > > that I can merge them as soon as possible. We are often resorting to
> > > builds on contributor's forks (assuming they have enabled Travis CI /
> > > Appveyor)
> > >
> > > As some context around Travis CI in particular, in January Travis CI
> > > was acquired by Idera, a private equity (I think?) developer tools
> > > conglomerate. It's likely that we're seeing some "maximize profit,
> > > minimize costs" behavior in play, so the recent experience could
> > > become the new normal.
> > >
> > > - Wes
> > >
>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Antoine Pitrou <an...@python.org>.
Le 02/07/2019 à 19:52, Micah Kornfield a écrit :
> Would GCP Cloud Build work [1].

The number one question is: does it offer *copious* capacity for open
source projects, for free? If it does not, it's not useful to bother
investigating it IMHO (there are dozens or even hundreds of online CI
providers, we can't go and look them all).

Regards

Antoine.


> 
> When trying to install it looks like the permissions required are:
> * Read access to code
> * Read access to issues, metadata, and pull requests
> * Read and write access to checks and commit statuses
> 
> It looks like the free tier is quite limited, but I can try to investigate
> if we have any sponsorship programs, if it looks interesting.
> 
> 
> [1] https://cloud.google.com/cloud-build/docs/run-builds-on-github
> 
> On Tue, Jul 2, 2019 at 9:40 AM Antoine Pitrou <an...@python.org> wrote:
> 
>>
>> Le 02/07/2019 à 18:22, Eric Erhardt a écrit :
>>> Has anyone considered using Azure DevOps for CI and patch validation?
>>
>> Tried indeed and failed:
>> https://issues.apache.org/jira/browse/INFRA-17030
>>
>> Regards
>>
>> Antoine.
>>
> 

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Micah Kornfield <em...@gmail.com>.
Would GCP Cloud Build work [1].

When trying to install it looks like the permissions required are:
* Read access to code
* Read access to issues, metadata, and pull requests
* Read and write access to checks and commit statuses

It looks like the free tier is quite limited, but I can try to investigate
if we have any sponsorship programs, if it looks interesting.


[1] https://cloud.google.com/cloud-build/docs/run-builds-on-github

On Tue, Jul 2, 2019 at 9:40 AM Antoine Pitrou <an...@python.org> wrote:

>
> Le 02/07/2019 à 18:22, Eric Erhardt a écrit :
> > Has anyone considered using Azure DevOps for CI and patch validation?
>
> Tried indeed and failed:
> https://issues.apache.org/jira/browse/INFRA-17030
>
> Regards
>
> Antoine.
>

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Antoine Pitrou <an...@python.org>.
Le 02/07/2019 à 18:22, Eric Erhardt a écrit :
> Has anyone considered using Azure DevOps for CI and patch validation?

Tried indeed and failed:
https://issues.apache.org/jira/browse/INFRA-17030

Regards

Antoine.

RE: [DISCUSS] Ongoing Travis CI service degradation

Posted by Eric Erhardt <Er...@microsoft.com.INVALID>.
Has anyone considered using Azure DevOps for CI and patch validation?

https://azure.microsoft.com/en-us/services/devops/pipelines/

> Get cloud-hosted pipelines for Linux, macOS, and Windows with unlimited minutes and 10 free parallel jobs for open source

I guess I am not familiar with ASF policies, but we've been using Azure DevOps on the .NET team for a while now (we've switched off of Jenkins) and there are some really great features. You can use cloud-hosted machines, or your own machines. It has Docker integration. And can scale up as large as necessary. It has great test failure reporting and analytics on which tests fail more often than others.

One scenario we have built on our team is an "Auto-merge" bot. This allows committers to mark a PR as "auto-mergeable", and when the validation pipeline is completed successfully, the PR is automatically merged. If new changes are pushed to the PR or the validation build fails, it shuts the auto-merge capability off. This has proven super useful on my team - no more monitoring builds to see when they can be merged. You can review the change, approve of it, and mark it as "auto-merge" and when the validation passes, it is merged by the bot.
This is just an example of the types of extensions you can build on Azure DevOps.

I thought I would throw this option out here, just to hear others' opinions (positive or negative) on using Azure DevOps.

Eric

-----Original Message-----
From: Wes McKinney <we...@gmail.com> 
Sent: Friday, June 28, 2019 12:06 PM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] Ongoing Travis CI service degradation

Based on the discussion in
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FINFRA-18533&amp;data=02%7C01%7CEric.Erhardt%40microsoft.com%7Cb9373c34d23347432e2b08d6fbeaf913%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636973383955537687&amp;sdata=h4PPFA%2BKwNjwue4V2LHrAVS0MK5QnBwO7HCA98Uo2xY%3D&amp;reserved=0 it does not appear to be ASF Infra's inclination to allow projects to donate money to the Foundation to get more build resources on Travis CI. Our likely only solution is going to be to reduce our dependence on Travis CI. In the short term, I would say that the sooner we can migrate all of our Linux builds to docker-compose form to aid in this transition, the better

We are hiring in our organization (Ursa Labs) for a dedicated role to support CI and development lifecycle automation (packaging, benchmarking, releases, etc.) in the Apache Arrow project, so I hope that we can provide even more help to resolve these issues in the future than we already are

On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Also note that the situation with AppVeyor isn't much better.
>
> Any "free as in beer" CI service is probably too capacity-limited for 
> our needs now, unless it allows private workers (which apparently 
> Gitlab CI does).
>
> Regards
>
> Antoine.
>
>
> Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> > It seems that there is intermittent Apache-wide degradation of 
> > Travis CI services -- I was looking at 
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftr
> > avis-ci.org%2Fapache&amp;data=02%7C01%7CEric.Erhardt%40microsoft.com
> > %7Cb9373c34d23347432e2b08d6fbeaf913%7C72f988bf86f141af91ab2d7cd011db
> > 47%7C1%7C0%7C636973383955547694&amp;sdata=reS1nDwycZXNo34MZPi4YQ1WIx
> > x%2By%2BbsV1Rp0108xE4%3D&amp;reserved=0 today and there appeared to 
> > be a stretch of 3-4 hours where no queued builds on github.com/apache were running at all. I initially thought that the issue was contention with other Apache projects but even with round-robin allocation and a concurrency limit (e.g. no Apache project having more than 5-6 concurrent builds) that wouldn't explain why NO builds are running.
> >
> > This is obviously disturbing given how reliant we are on Travis CI 
> > to validate patches to be merged.
> >
> > I've opened a support ticket with Travis CI to see if they can 
> > provide some insight into what's going on. There is also an INFRA 
> > ticket where other projects have reported some similar experiences
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fis
> > sues.apache.org%2Fjira%2Fbrowse%2FINFRA-18533&amp;data=02%7C01%7CEri
> > c.Erhardt%40microsoft.com%7Cb9373c34d23347432e2b08d6fbeaf913%7C72f98
> > 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636973383955547694&amp;sdata=G
> > 07luHnnCAi3aqLeoFuTaY3bq1kWqWjG1l3tnaept9c%3D&amp;reserved=0
> >
> > As a meta-comment, at some point Apache Arrow is going to need to 
> > move off of public CI services for patch validation so that we can 
> > have unilateral control over scaling our build / test resources as 
> > the community grows larger. As the most active merger of patches (I 
> > have merged over 50% of pull requests over the project's history) 
> > this affects me greatly as I am often monitoring builds on many open 
> > PRs so that I can merge them as soon as possible. We are often 
> > resorting to builds on contributor's forks (assuming they have 
> > enabled Travis CI /
> > Appveyor)
> >
> > As some context around Travis CI in particular, in January Travis CI 
> > was acquired by Idera, a private equity (I think?) developer tools 
> > conglomerate. It's likely that we're seeing some "maximize profit, 
> > minimize costs" behavior in play, so the recent experience could 
> > become the new normal.
> >
> > - Wes
> >

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Wes McKinney <we...@gmail.com>.
Based on the discussion in
https://issues.apache.org/jira/browse/INFRA-18533 it does not appear
to be ASF Infra's inclination to allow projects to donate money to the
Foundation to get more build resources on Travis CI. Our likely only
solution is going to be to reduce our dependence on Travis CI. In the
short term, I would say that the sooner we can migrate all of our
Linux builds to docker-compose form to aid in this transition, the
better

We are hiring in our organization (Ursa Labs) for a dedicated role to
support CI and development lifecycle automation (packaging,
benchmarking, releases, etc.) in the Apache Arrow project, so I hope
that we can provide even more help to resolve these issues in the
future than we already are

On Wed, Jun 26, 2019 at 11:35 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Also note that the situation with AppVeyor isn't much better.
>
> Any "free as in beer" CI service is probably too capacity-limited for
> our needs now, unless it allows private workers (which apparently Gitlab
> CI does).
>
> Regards
>
> Antoine.
>
>
> Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> > It seems that there is intermittent Apache-wide degradation of Travis
> > CI services -- I was looking at https://travis-ci.org/apache today and
> > there appeared to be a stretch of 3-4 hours where no queued builds on
> > github.com/apache were running at all. I initially thought that the
> > issue was contention with other Apache projects but even with
> > round-robin allocation and a concurrency limit (e.g. no Apache project
> > having more than 5-6 concurrent builds) that wouldn't explain why NO
> > builds are running.
> >
> > This is obviously disturbing given how reliant we are on Travis CI to
> > validate patches to be merged.
> >
> > I've opened a support ticket with Travis CI to see if they can provide
> > some insight into what's going on. There is also an INFRA ticket where
> > other projects have reported some similar experiences
> >
> > https://issues.apache.org/jira/browse/INFRA-18533
> >
> > As a meta-comment, at some point Apache Arrow is going to need to move
> > off of public CI services for patch validation so that we can have
> > unilateral control over scaling our build / test resources as the
> > community grows larger. As the most active merger of patches (I have
> > merged over 50% of pull requests over the project's history) this
> > affects me greatly as I am often monitoring builds on many open PRs so
> > that I can merge them as soon as possible. We are often resorting to
> > builds on contributor's forks (assuming they have enabled Travis CI /
> > Appveyor)
> >
> > As some context around Travis CI in particular, in January Travis CI
> > was acquired by Idera, a private equity (I think?) developer tools
> > conglomerate. It's likely that we're seeing some "maximize profit,
> > minimize costs" behavior in play, so the recent experience could
> > become the new normal.
> >
> > - Wes
> >

Re: [DISCUSS] Ongoing Travis CI service degradation

Posted by Antoine Pitrou <an...@python.org>.
Also note that the situation with AppVeyor isn't much better.

Any "free as in beer" CI service is probably too capacity-limited for
our needs now, unless it allows private workers (which apparently Gitlab
CI does).

Regards

Antoine.


Le 26/06/2019 à 18:32, Wes McKinney a écrit :
> It seems that there is intermittent Apache-wide degradation of Travis
> CI services -- I was looking at https://travis-ci.org/apache today and
> there appeared to be a stretch of 3-4 hours where no queued builds on
> github.com/apache were running at all. I initially thought that the
> issue was contention with other Apache projects but even with
> round-robin allocation and a concurrency limit (e.g. no Apache project
> having more than 5-6 concurrent builds) that wouldn't explain why NO
> builds are running.
> 
> This is obviously disturbing given how reliant we are on Travis CI to
> validate patches to be merged.
> 
> I've opened a support ticket with Travis CI to see if they can provide
> some insight into what's going on. There is also an INFRA ticket where
> other projects have reported some similar experiences
> 
> https://issues.apache.org/jira/browse/INFRA-18533
> 
> As a meta-comment, at some point Apache Arrow is going to need to move
> off of public CI services for patch validation so that we can have
> unilateral control over scaling our build / test resources as the
> community grows larger. As the most active merger of patches (I have
> merged over 50% of pull requests over the project's history) this
> affects me greatly as I am often monitoring builds on many open PRs so
> that I can merge them as soon as possible. We are often resorting to
> builds on contributor's forks (assuming they have enabled Travis CI /
> Appveyor)
> 
> As some context around Travis CI in particular, in January Travis CI
> was acquired by Idera, a private equity (I think?) developer tools
> conglomerate. It's likely that we're seeing some "maximize profit,
> minimize costs" behavior in play, so the recent experience could
> become the new normal.
> 
> - Wes
>