You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Krisztián Szűcs <sz...@gmail.com> on 2019/08/29 12:19:16 UTC

[PROPOSAL] Consolidate Arrow's CI configuration

Hi,

Arrow's current continuous integration setup utilizes multiple CI
providers,
tools, and scripts:

 - Unit tests are running on Travis and Appveyor
 - Binary packaging builds are running on crossbow, an abstraction over
multiple
   CI providers driven through a GitHub repository
 - For local tests and tasks, there is a docker-compose setup, or of course
you
   can maintain your own environment

This setup has run into some limitations:
 - It’s slow: the CI parallelism of Travis has degraded over the last
couple of
   months. Testing a PR takes more than an hour, which is a long time for
both
   the maintainers and the contributors, and it has a negative effect on
the
   development throughput.
 - Build configurations are not portable, they are tied to specific
services.
   You can’t just take a Travis script and run it somewhere else.
 - Because they’re not portable, build configurations are duplicated in
several
   places.
 - The Travis, Appveyor and crossbow builds are not reproducible locally,
so
   developing them requires the slow git push cycles.
 - Public CI has limited platform support, just for example ARM machines
are
   not available.
 - Public CI also has limited hardware support, no GPUs are available

Resolving all of the issues above is complicated, but is a must for the
long
term sustainability of Arrow.

For some time, we’ve been working on a tool called Ursabot[1], a library on
top
of the CI framework Buildbot[2]. Buildbot is well maintained and widely
used
for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
Buildbot
is not another hosted CI service like Travis or Appveyor: it is an
extensible
framework to implement various automations like continuous integration
tasks.

You’ve probably noticed additional “Ursabot” builds appearing on pull
requests,
in addition to the Travis and Appveyor builds. We’ve been testing the
framework
with a fully featured CI server at ci.ursalabs.org. This service runs build
configurations we can’t run on Travis, does it faster than Travis, and has
the
GitHub comment bot integration for ad hoc build triggering.

While we’re not prepared to propose moving all CI to a self-hosted setup,
our
work has demonstrated the potential of using buildbot to resolve Arrow’s
continuous integration challenges:
 - The docker-based builders are reusing the docker images, which eliminate
   slow dependency installation steps. Some builds on this setup, run on
   Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
   Travis-CI jobs.
 - It’s scalable. We can deploy buildbot wherever and add more masters and
   workers, which we can’t do with public CI.
 - It’s platform and CI-provider independent. Builds can be run on
arbitrary
   architectures, operating systems, and hardware: Python is the only
   requirement. Additionally builds specified in buildbot/ursabot can be
run
   anywhere: not only on custom buildbot infrastructure but also on Travis,
or
   even on your own machine.
 - It improves reproducibility and encourages consolidation of
configuration.
   You can run the exact job locally that ran on Travis, and you can even
get
   an interactive shell in the build so you can debug a test failure. And
   because you can run the same job anywhere, we wouldn’t need to have
   duplicated, Travis-specific or the docker-compose build configuration
stored
   separately.
 - It’s extensible. More exotic features like a comment bot, benchmark
   database, benchmark dashboard, artifact store, integrating other systems
are
   easily implementable within the same system.

I’m proposing to donate the build configuration we’ve been iterating on in
Ursabot to the Arrow codebase. Here [3] is a patch that adds the
configuration.
This will enable us to explore consolidating build configuration using the
buildbot framework. A next step after to explore that would be to port a
Travis
build to Ursabot, and in the Travis configuration, execute the build by the
shell command `$ ursabot project build <builder-name>`. This is the same
way we
would be able to execute the build locally--something we can’t currently do
with the Travis builds.

I am not proposing here that we stop using Travis-CI and Appveyor to run CI
for
apache/arrow, though that may well be a direction we choose to go in the
future. Moving build configuration into something like buildbot would be a
necessary first step to do that; that said, there are other immediate
benefits
to be had by porting build configuration into buildbot: local
reproducibility,
consolidation of build logic, independence from a particular CI provider,
and
ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
brings
a number of other challenges, which we will concurrently continue to
explore,
but we believe that there are benefits to adopting buildbot build
configuration
regardless.

Regards, Krisztian

[1]: https://github.com/ursa-labs/ursabot
[2]: https://buildbot.net
     https://docs.buildbot.net
     https://github.com/buildbot/buildbot
[3]: https://github.com/apache/arrow/pull/5210

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Antoine Pitrou <an...@python.org>.


Le 06/09/2019 à 16:18, Krisztián Szűcs a écrit :
> On Fri, Sep 6, 2019 at 12:48 PM Antoine Pitrou <so...@pitrou.net> wrote:
> 
>> On Fri, 6 Sep 2019 12:41:15 +0200
>> Krisztián Szűcs <sz...@gmail.com> wrote:
>>>>
>>>> I get the impression that it is a complicated and fragile solution to
>>>> the problem.
>>>>
>>> Ursabot has a bunch of tests to ensure that we don't brake any of the
>>> functionality,
>>> so fragility can be avoided by testing it.
>>
>> Testing lets you detect breakage, it doesn't make the chosen solution
>> less likely to break.
>>
>> I'm curious: is this officially supported by buildbot?  Or is it
>> something that happens to work?
>>
> If you mean changing the configuration from pull requests without
> restarting the master, then I don't think so. Usually the buildmaster
> configuration is hosted in a separate repo. Although I've seen examples
> to automatically reload the configuration based on pull requests, but
> the whole buildmaster process, not just certain parts of it.
> We can ask some advice from the buildbot community about a proper
> implementation of this feature.

Indeed, I think it would be good to get their confirmation that we are
standing on firm ground here.

Regards

Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Fri, Sep 6, 2019 at 12:48 PM Antoine Pitrou <so...@pitrou.net> wrote:

> On Fri, 6 Sep 2019 12:41:15 +0200
> Krisztián Szűcs <sz...@gmail.com> wrote:
> > >
> > > I get the impression that it is a complicated and fragile solution to
> > > the problem.
> > >
> > Ursabot has a bunch of tests to ensure that we don't brake any of the
> > functionality,
> > so fragility can be avoided by testing it.
>
> Testing lets you detect breakage, it doesn't make the chosen solution
> less likely to break.
>
> I'm curious: is this officially supported by buildbot?  Or is it
> something that happens to work?
>
If you mean changing the configuration from pull requests without
restarting the master, then I don't think so. Usually the buildmaster
configuration is hosted in a separate repo. Although I've seen examples
to automatically reload the configuration based on pull requests, but
the whole buildmaster process, not just certain parts of it.
We can ask some advice from the buildbot community about a proper
implementation of this feature.

>
> Regards
>
> Antoine.
>
>
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Antoine Pitrou <so...@pitrou.net>.

On Fri, 6 Sep 2019 12:41:15 +0200
Krisztián Szűcs <sz...@gmail.com> wrote:
> >
> > I get the impression that it is a complicated and fragile solution to
> > the problem.
> >  
> Ursabot has a bunch of tests to ensure that we don't brake any of the
> functionality,
> so fragility can be avoided by testing it.

Testing lets you detect breakage, it doesn't make the chosen solution
less likely to break.

I'm curious: is this officially supported by buildbot?  Or is it
something that happens to work?

Regards

Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Fri, Sep 6, 2019 at 12:15 PM Antoine Pitrou <an...@python.org> wrote:

>
> Le 06/09/2019 à 12:13, Krisztián Szűcs a écrit :
> > On Fri, Sep 6, 2019 at 12:01 PM Antoine Pitrou <an...@python.org>
> wrote:
> >
> >>
> >> Le 06/09/2019 à 10:07, Krisztián Szűcs a écrit :
> >>> For example trigger a builder for changes affecting files under
> arrow/ci
> >>> which reloads the builder object within the build master's process.
> >>
> >> I am asking you how this affects only the current build and not other
> >> concurrent builds.
> >
> > We need to register the changed builders as new ones with a corresponding
> > triggerable scheduler, and trigger them.
>
> Don't you still need to restart the buildmaster to see those new builders?
>
Nope.

>
> I get the impression that it is a complicated and fragile solution to
> the problem.
>
Ursabot has a bunch of tests to ensure that we don't brake any of the
functionality,
so fragility can be avoided by testing it. There might be easier solutions,
like spinning
up another master, but choosing the right way to do it requires some
experimenting.

>
> Regards
>
> Antoine.
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Antoine Pitrou <an...@python.org>.

Le 06/09/2019 à 12:13, Krisztián Szűcs a écrit :
> On Fri, Sep 6, 2019 at 12:01 PM Antoine Pitrou <an...@python.org> wrote:
> 
>>
>> Le 06/09/2019 à 10:07, Krisztián Szűcs a écrit :
>>> For example trigger a builder for changes affecting files under arrow/ci
>>> which reloads the builder object within the build master's process.
>>
>> I am asking you how this affects only the current build and not other
>> concurrent builds.
> 
> We need to register the changed builders as new ones with a corresponding
> triggerable scheduler, and trigger them.

Don't you still need to restart the buildmaster to see those new builders?

I get the impression that it is a complicated and fragile solution to
the problem.

Regards

Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Fri, Sep 6, 2019 at 12:01 PM Antoine Pitrou <an...@python.org> wrote:

>
> Le 06/09/2019 à 10:07, Krisztián Szűcs a écrit :
> > For example trigger a builder for changes affecting files under arrow/ci
> > which reloads the builder object within the build master's process.
>
> I am asking you how this affects only the current build and not other
> concurrent builds.

We need to register the changed builders as new ones with a corresponding
triggerable scheduler, and trigger them.

>
>
Regards
>
> Antoine.
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Antoine Pitrou <an...@python.org>.

Le 06/09/2019 à 10:07, Krisztián Szűcs a écrit :
> For example trigger a builder for changes affecting files under arrow/ci
> which reloads the builder object within the build master's process.

I am asking you how this affects only the current build and not other
concurrent builds.

Regards

Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

For example trigger a builder for changes affecting files under arrow/ci
which reloads the builder object within the build master's process. We
are not limited to shell commands, arbitrary python functions can be
executed too, but the semantics would be similar to running
MasterShellCommand [1].

[1]:
http://docs.buildbot.net/2.4.0/manual/configuration/buildsteps.html#running-commands-on-the-master

On Thu, Sep 5, 2019 at 3:17 PM Antoine Pitrou <an...@python.org> wrote:

>
> Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
> >>
> >> If going with buildbot, this means that the various build steps need to
> >> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
> >> "test", "after-test"...) and their contents expressed outside of the
> >> buildmaster configuration per se.
> >>
> > This is partially resolved with the Builder abstraction, see an example
> > here [1]. We just need to add and reload these Builder configurations
> > dynamically on certain events, like when someone changes a builder
> > from a PR.
>
> This is inside the buildmaster process, right?  I don't understand how
> you plan to change those dynamically without affecting all concurrent
> builds.
>
> Regards
>
> Antoine.
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Fri, Sep 6, 2019 at 12:23 AM Wes McKinney <we...@gmail.com> wrote:

> hi Krisztian,
>
> Anyone who's developing in the project can see that the Buildbot setup
> is working well (at least for Linux builds) and giving much more
> timely feedback, which has been very helpful.
>
> I'm concerned about the "ursabot" approach for a few reasons:
>
> * If we are to centralize our tooling for Arrow CI builds, why can we
> not have the build tool itself under Arrow governance?
> * The current "ursabot" tool has GPL dependencies. Can these be
> factored out into plugins so that the tool itself is ASF-compatible?
> * This is a bit nitpicky but the name "ursabot" bears the name mark of
> an organization that funds developers in this project. I'm concerned
> about this, as I would about a tool named "clouderabot", "dremiobot",
> "databricksbot", "googlebot", "ibmbot" or anything like that. It's
> different from using a tool developed by an unaffiliated third party
>
> In any case, I think putting the build configurations for the current
> Ursa Labs-managed build cluster in the Apache Arrow repository is a
> good idea, but there are likely a number of issues that we need to
> address to be able to contemplate having a hard dependency between the
> CI that we depend on to merge patches and this tool.
>
How should we move forward with the donation?
Should we have a separate thread for voting?
Do we need any special steps for the IP clearance?

>
> - Wes
>
> On Thu, Sep 5, 2019 at 8:17 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
> > >>
> > >> If going with buildbot, this means that the various build steps need
> to
> > >> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
> > >> "test", "after-test"...) and their contents expressed outside of the
> > >> buildmaster configuration per se.
> > >>
> > > This is partially resolved with the Builder abstraction, see an example
> > > here [1]. We just need to add and reload these Builder configurations
> > > dynamically on certain events, like when someone changes a builder
> > > from a PR.
> >
> > This is inside the buildmaster process, right?  I don't understand how
> > you plan to change those dynamically without affecting all concurrent
> > builds.
> >
> > Regards
> >
> > Antoine.
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Fri, Sep 6, 2019 at 7:56 PM Wes McKinney <we...@gmail.com> wrote:

> On Fri, Sep 6, 2019 at 3:18 AM Krisztián Szűcs
> <sz...@gmail.com> wrote:
> >
> > Hey Wes,
> >
> > On Fri, Sep 6, 2019 at 12:23 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > hi Krisztian,
> > >
> > > Anyone who's developing in the project can see that the Buildbot setup
> > > is working well (at least for Linux builds) and giving much more
> > > timely feedback, which has been very helpful.
> > >
> > > I'm concerned about the "ursabot" approach for a few reasons:
> > >
> > > * If we are to centralize our tooling for Arrow CI builds, why can we
> > > not have the build tool itself under Arrow governance?
> > >
> > See below.
> >
> > > * The current "ursabot" tool has GPL dependencies. Can these be
> > > factored out into plugins so that the tool itself is ASF-compatible?
> >
> > Ursabot is actually a buildbot plugin, however it contains some vendored
> > code from buildbot. If we can push those fixes upstream to buildbot, then
> > ursabot can be ASF compatible, thus may be maintained within arrow.
> >
> > > * This is a bit nitpicky but the name "ursabot" bears the name mark of
> > > an organization that funds developers in this project. I'm concerned
> > > about this, as I would about a tool named "clouderabot", "dremiobot",
> > > "databricksbot", "googlebot", "ibmbot" or anything like that. It's
> > > different from using a tool developed by an unaffiliated third party
> > >
> > Ursa-labs is concentrated to the development of Arrow, so I think it is a
> > bit different than your examples.
> > We can rename it if you want or resolve the licensing of ursabot (push
> > all of the vendored code to buildbot), then donate it to Arrow.
> >
>
> You're suggesting that one organization that contributes to Apache
> Arrow deserves preferential treatment over others. This is not
> consistent with Apache project independence
>
It wasn't my intention to suggest that.

We just need to have a repository for the `arrow buildbot plugin` and
GitHub
user to interact with. We can move this repository to any GitHub
organization
or user, and we can pick arbitrary name for both the repository and the
github
machine account.

>
> https://community.apache.org/projectIndependence.html
>
> "Apache projects must be managed independently, and PMCs must ensure
> that they are acting in the best interests of the project as a whole.
> Note that it is similarly important that the PMC clearly show this
> independence within their project community. The perception of
> existing and new participants within the community that the PMC is run
> independently and without favoring any specific third parties over
> others is important, to allow new contributors to feel comfortable
> both joining the community and contributing their work. A community
> that obviously favors one specific vendor in some exclusive way will
> often discourage new contributors from competing vendors, which is an
> issue for the long term health of the project."
>
> > >
> > > In any case, I think putting the build configurations for the current
> > > Ursa Labs-managed build cluster in the Apache Arrow repository is a
> > > good idea, but there are likely a number of issues that we need to
> > > address to be able to contemplate having a hard dependency between the
> > > CI that we depend on to merge patches and this tool.
> > >
> > > - Wes
> > >
> > > On Thu, Sep 5, 2019 at 8:17 AM Antoine Pitrou <an...@python.org>
> wrote:
> > > >
> > > >
> > > > Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
> > > > >>
> > > > >> If going with buildbot, this means that the various build steps
> need
> > > to
> > > > >> be generic like in Travis-CI (e.g. "install", "setup",
> "before-test",
> > > > >> "test", "after-test"...) and their contents expressed outside of
> the
> > > > >> buildmaster configuration per se.
> > > > >>
> > > > > This is partially resolved with the Builder abstraction, see an
> example
> > > > > here [1]. We just need to add and reload these Builder
> configurations
> > > > > dynamically on certain events, like when someone changes a builder
> > > > > from a PR.
> > > >
> > > > This is inside the buildmaster process, right?  I don't understand
> how
> > > > you plan to change those dynamically without affecting all concurrent
> > > > builds.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > >
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Wes McKinney <we...@gmail.com>.

On Fri, Sep 6, 2019 at 3:18 AM Krisztián Szűcs
<sz...@gmail.com> wrote:
>
> Hey Wes,
>
> On Fri, Sep 6, 2019 at 12:23 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Krisztian,
> >
> > Anyone who's developing in the project can see that the Buildbot setup
> > is working well (at least for Linux builds) and giving much more
> > timely feedback, which has been very helpful.
> >
> > I'm concerned about the "ursabot" approach for a few reasons:
> >
> > * If we are to centralize our tooling for Arrow CI builds, why can we
> > not have the build tool itself under Arrow governance?
> >
> See below.
>
> > * The current "ursabot" tool has GPL dependencies. Can these be
> > factored out into plugins so that the tool itself is ASF-compatible?
>
> Ursabot is actually a buildbot plugin, however it contains some vendored
> code from buildbot. If we can push those fixes upstream to buildbot, then
> ursabot can be ASF compatible, thus may be maintained within arrow.
>
> > * This is a bit nitpicky but the name "ursabot" bears the name mark of
> > an organization that funds developers in this project. I'm concerned
> > about this, as I would about a tool named "clouderabot", "dremiobot",
> > "databricksbot", "googlebot", "ibmbot" or anything like that. It's
> > different from using a tool developed by an unaffiliated third party
> >
> Ursa-labs is concentrated to the development of Arrow, so I think it is a
> bit different than your examples.
> We can rename it if you want or resolve the licensing of ursabot (push
> all of the vendored code to buildbot), then donate it to Arrow.
>

You're suggesting that one organization that contributes to Apache
Arrow deserves preferential treatment over others. This is not
consistent with Apache project independence

https://community.apache.org/projectIndependence.html

"Apache projects must be managed independently, and PMCs must ensure
that they are acting in the best interests of the project as a whole.
Note that it is similarly important that the PMC clearly show this
independence within their project community. The perception of
existing and new participants within the community that the PMC is run
independently and without favoring any specific third parties over
others is important, to allow new contributors to feel comfortable
both joining the community and contributing their work. A community
that obviously favors one specific vendor in some exclusive way will
often discourage new contributors from competing vendors, which is an
issue for the long term health of the project."

> >
> > In any case, I think putting the build configurations for the current
> > Ursa Labs-managed build cluster in the Apache Arrow repository is a
> > good idea, but there are likely a number of issues that we need to
> > address to be able to contemplate having a hard dependency between the
> > CI that we depend on to merge patches and this tool.
> >
> > - Wes
> >
> > On Thu, Sep 5, 2019 at 8:17 AM Antoine Pitrou <an...@python.org> wrote:
> > >
> > >
> > > Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
> > > >>
> > > >> If going with buildbot, this means that the various build steps need
> > to
> > > >> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
> > > >> "test", "after-test"...) and their contents expressed outside of the
> > > >> buildmaster configuration per se.
> > > >>
> > > > This is partially resolved with the Builder abstraction, see an example
> > > > here [1]. We just need to add and reload these Builder configurations
> > > > dynamically on certain events, like when someone changes a builder
> > > > from a PR.
> > >
> > > This is inside the buildmaster process, right?  I don't understand how
> > > you plan to change those dynamically without affecting all concurrent
> > > builds.
> > >
> > > Regards
> > >
> > > Antoine.
> >

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

Hey Wes,

On Fri, Sep 6, 2019 at 12:23 AM Wes McKinney <we...@gmail.com> wrote:

> hi Krisztian,
>
> Anyone who's developing in the project can see that the Buildbot setup
> is working well (at least for Linux builds) and giving much more
> timely feedback, which has been very helpful.
>
> I'm concerned about the "ursabot" approach for a few reasons:
>
> * If we are to centralize our tooling for Arrow CI builds, why can we
> not have the build tool itself under Arrow governance?
>
See below.

> * The current "ursabot" tool has GPL dependencies. Can these be
> factored out into plugins so that the tool itself is ASF-compatible?

Ursabot is actually a buildbot plugin, however it contains some vendored
code from buildbot. If we can push those fixes upstream to buildbot, then
ursabot can be ASF compatible, thus may be maintained within arrow.

> * This is a bit nitpicky but the name "ursabot" bears the name mark of
> an organization that funds developers in this project. I'm concerned
> about this, as I would about a tool named "clouderabot", "dremiobot",
> "databricksbot", "googlebot", "ibmbot" or anything like that. It's
> different from using a tool developed by an unaffiliated third party
>
Ursa-labs is concentrated to the development of Arrow, so I think it is a
bit different than your examples.
We can rename it if you want or resolve the licensing of ursabot (push
all of the vendored code to buildbot), then donate it to Arrow.

>
> In any case, I think putting the build configurations for the current
> Ursa Labs-managed build cluster in the Apache Arrow repository is a
> good idea, but there are likely a number of issues that we need to
> address to be able to contemplate having a hard dependency between the
> CI that we depend on to merge patches and this tool.
>
> - Wes
>
> On Thu, Sep 5, 2019 at 8:17 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
> > >>
> > >> If going with buildbot, this means that the various build steps need
> to
> > >> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
> > >> "test", "after-test"...) and their contents expressed outside of the
> > >> buildmaster configuration per se.
> > >>
> > > This is partially resolved with the Builder abstraction, see an example
> > > here [1]. We just need to add and reload these Builder configurations
> > > dynamically on certain events, like when someone changes a builder
> > > from a PR.
> >
> > This is inside the buildmaster process, right?  I don't understand how
> > you plan to change those dynamically without affecting all concurrent
> > builds.
> >
> > Regards
> >
> > Antoine.
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Wes McKinney <we...@gmail.com>.

hi Krisztian,

Anyone who's developing in the project can see that the Buildbot setup
is working well (at least for Linux builds) and giving much more
timely feedback, which has been very helpful.

I'm concerned about the "ursabot" approach for a few reasons:

* If we are to centralize our tooling for Arrow CI builds, why can we
not have the build tool itself under Arrow governance?
* The current "ursabot" tool has GPL dependencies. Can these be
factored out into plugins so that the tool itself is ASF-compatible?
* This is a bit nitpicky but the name "ursabot" bears the name mark of
an organization that funds developers in this project. I'm concerned
about this, as I would about a tool named "clouderabot", "dremiobot",
"databricksbot", "googlebot", "ibmbot" or anything like that. It's
different from using a tool developed by an unaffiliated third party

In any case, I think putting the build configurations for the current
Ursa Labs-managed build cluster in the Apache Arrow repository is a
good idea, but there are likely a number of issues that we need to
address to be able to contemplate having a hard dependency between the
CI that we depend on to merge patches and this tool.

- Wes

On Thu, Sep 5, 2019 at 8:17 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
> >>
> >> If going with buildbot, this means that the various build steps need to
> >> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
> >> "test", "after-test"...) and their contents expressed outside of the
> >> buildmaster configuration per se.
> >>
> > This is partially resolved with the Builder abstraction, see an example
> > here [1]. We just need to add and reload these Builder configurations
> > dynamically on certain events, like when someone changes a builder
> > from a PR.
>
> This is inside the buildmaster process, right?  I don't understand how
> you plan to change those dynamically without affecting all concurrent
> builds.
>
> Regards
>
> Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Antoine Pitrou <an...@python.org>.

Le 05/09/2019 à 15:04, Krisztián Szűcs a écrit :
>>
>> If going with buildbot, this means that the various build steps need to
>> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
>> "test", "after-test"...) and their contents expressed outside of the
>> buildmaster configuration per se.
>>
> This is partially resolved with the Builder abstraction, see an example
> here [1]. We just need to add and reload these Builder configurations
> dynamically on certain events, like when someone changes a builder
> from a PR.

This is inside the buildmaster process, right?  I don't understand how
you plan to change those dynamically without affecting all concurrent
builds.

Regards

Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

Hey Antoine,

On Thu, Sep 5, 2019 at 2:54 PM Antoine Pitrou <an...@python.org> wrote:

>
> Le 05/09/2019 à 14:43, Uwe L. Korn a écrit :
> > Hello Krisztián,
> >
> >> Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs <
> szucs.krisztian@gmail.com>:
> >>
> >>> * The build configuration is automatically updated on a merge to
> master?
> >>>
> >> Not yet, but this can be automatized too with buildbot itself.
> >
> > This is something I would  actually like to have before getting rid of
> the Travis jobs. Otherwise we would be constrainted quite a bit in
> development when master CI breaks because of an environment issue until one
> of the few people who can update the config become available.
>
> I would go further and say that PRs and branches need to be able to run
> different build configurations.  We are moving too fast to afford an
> inflexible centralized configuration.
>
Agree. I haven't had time to work on it yet, although I have a couple of
solutions in mind. Once we decide to move on with this proposal we
can allocate time to resolve it.

>
> If going with buildbot, this means that the various build steps need to
> be generic like in Travis-CI (e.g. "install", "setup", "before-test",
> "test", "after-test"...) and their contents expressed outside of the
> buildmaster configuration per se.
>
This is partially resolved with the Builder abstraction, see an example
here [1]. We just need to add and reload these Builder configurations
dynamically on certain events, like when someone changes a builder
from a PR.

[1]:
https://github.com/apache/arrow/blob/305e7387d429f095019c74f17e0c9c7cb443bb70/ci/buildbot/arrow/builders.py#L366


>
> Regards
>
> Antoine.
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Antoine Pitrou <an...@python.org>.

Le 05/09/2019 à 14:43, Uwe L. Korn a écrit :
> Hello Krisztián,
> 
>> Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs <sz...@gmail.com>:
>>
>>> * The build configuration is automatically updated on a merge to master?
>>>
>> Not yet, but this can be automatized too with buildbot itself.
> 
> This is something I would  actually like to have before getting rid of the Travis jobs. Otherwise we would be constrainted quite a bit in development when master CI breaks because of an environment issue until one of the few people who can update the config become available.

I would go further and say that PRs and branches need to be able to run
different build configurations.  We are moving too fast to afford an
inflexible centralized configuration.

If going with buildbot, this means that the various build steps need to
be generic like in Travis-CI (e.g. "install", "setup", "before-test",
"test", "after-test"...) and their contents expressed outside of the
buildmaster configuration per se.

Regards

Antoine.

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by "Uwe L. Korn" <uw...@xhochy.com>.

Hello Krisztián,

> Am 05.09.2019 um 14:22 schrieb Krisztián Szűcs <sz...@gmail.com>:
> 
>> * The build configuration is automatically updated on a merge to master?
>> 
> Not yet, but this can be automatized too with buildbot itself.

This is something I would  actually like to have before getting rid of the Travis jobs. Otherwise we would be constrainted quite a bit in development when master CI breaks because of an environment issue until one of the few people who can update the config become available.

Uwe 


> 
>> 
>> And then a not so simple one: What will happen to our current
>> docker-compose setup? From the PR it seems like we do similar things with
>> ursabot but not using the central docker-compose.yml?
>> 
> Currently we're using docker-compose to run one-off containers rather
> than long running, multi-container services (which docker-compose is
> designed for). Ursabot already supports the features we need from
> docker-compose, so it can effectively replace the docker-compose
> setup as well. We have low-level control over the docker API, so we
> are able to tailor it to our requirements.
> 
>> 
>> 
>> Cheers
>> Uwe
>> 
>>> Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs <
>> szucs.krisztian@gmail.com>:
>>> 
>>> Hi,
>>> 
>>> Arrow's current continuous integration setup utilizes multiple CI
>>> providers,
>>> tools, and scripts:
>>> 
>>> - Unit tests are running on Travis and Appveyor
>>> - Binary packaging builds are running on crossbow, an abstraction over
>>> multiple
>>>  CI providers driven through a GitHub repository
>>> - For local tests and tasks, there is a docker-compose setup, or of
>> course
>>> you
>>>  can maintain your own environment
>>> 
>>> This setup has run into some limitations:
>>> - It’s slow: the CI parallelism of Travis has degraded over the last
>>> couple of
>>>  months. Testing a PR takes more than an hour, which is a long time for
>>> both
>>>  the maintainers and the contributors, and it has a negative effect on
>>> the
>>>  development throughput.
>>> - Build configurations are not portable, they are tied to specific
>>> services.
>>>  You can’t just take a Travis script and run it somewhere else.
>>> - Because they’re not portable, build configurations are duplicated in
>>> several
>>>  places.
>>> - The Travis, Appveyor and crossbow builds are not reproducible locally,
>>> so
>>>  developing them requires the slow git push cycles.
>>> - Public CI has limited platform support, just for example ARM machines
>>> are
>>>  not available.
>>> - Public CI also has limited hardware support, no GPUs are available
>>> 
>>> Resolving all of the issues above is complicated, but is a must for the
>>> long
>>> term sustainability of Arrow.
>>> 
>>> For some time, we’ve been working on a tool called Ursabot[1], a library
>> on
>>> top
>>> of the CI framework Buildbot[2]. Buildbot is well maintained and widely
>>> used
>>> for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
>>> Buildbot
>>> is not another hosted CI service like Travis or Appveyor: it is an
>>> extensible
>>> framework to implement various automations like continuous integration
>>> tasks.
>>> 
>>> You’ve probably noticed additional “Ursabot” builds appearing on pull
>>> requests,
>>> in addition to the Travis and Appveyor builds. We’ve been testing the
>>> framework
>>> with a fully featured CI server at ci.ursalabs.org. This service runs
>> build
>>> configurations we can’t run on Travis, does it faster than Travis, and
>> has
>>> the
>>> GitHub comment bot integration for ad hoc build triggering.
>>> 
>>> While we’re not prepared to propose moving all CI to a self-hosted setup,
>>> our
>>> work has demonstrated the potential of using buildbot to resolve Arrow’s
>>> continuous integration challenges:
>>> - The docker-based builders are reusing the docker images, which
>> eliminate
>>>  slow dependency installation steps. Some builds on this setup, run on
>>>  Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
>>>  Travis-CI jobs.
>>> - It’s scalable. We can deploy buildbot wherever and add more masters and
>>>  workers, which we can’t do with public CI.
>>> - It’s platform and CI-provider independent. Builds can be run on
>>> arbitrary
>>>  architectures, operating systems, and hardware: Python is the only
>>>  requirement. Additionally builds specified in buildbot/ursabot can be
>>> run
>>>  anywhere: not only on custom buildbot infrastructure but also on
>> Travis,
>>> or
>>>  even on your own machine.
>>> - It improves reproducibility and encourages consolidation of
>>> configuration.
>>>  You can run the exact job locally that ran on Travis, and you can even
>>> get
>>>  an interactive shell in the build so you can debug a test failure. And
>>>  because you can run the same job anywhere, we wouldn’t need to have
>>>  duplicated, Travis-specific or the docker-compose build configuration
>>> stored
>>>  separately.
>>> - It’s extensible. More exotic features like a comment bot, benchmark
>>>  database, benchmark dashboard, artifact store, integrating other
>> systems
>>> are
>>>  easily implementable within the same system.
>>> 
>>> I’m proposing to donate the build configuration we’ve been iterating on
>> in
>>> Ursabot to the Arrow codebase. Here [3] is a patch that adds the
>>> configuration.
>>> This will enable us to explore consolidating build configuration using
>> the
>>> buildbot framework. A next step after to explore that would be to port a
>>> Travis
>>> build to Ursabot, and in the Travis configuration, execute the build by
>> the
>>> shell command `$ ursabot project build <builder-name>`. This is the same
>>> way we
>>> would be able to execute the build locally--something we can’t currently
>> do
>>> with the Travis builds.
>>> 
>>> I am not proposing here that we stop using Travis-CI and Appveyor to run
>> CI
>>> for
>>> apache/arrow, though that may well be a direction we choose to go in the
>>> future. Moving build configuration into something like buildbot would be
>> a
>>> necessary first step to do that; that said, there are other immediate
>>> benefits
>>> to be had by porting build configuration into buildbot: local
>>> reproducibility,
>>> consolidation of build logic, independence from a particular CI provider,
>>> and
>>> ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
>>> brings
>>> a number of other challenges, which we will concurrently continue to
>>> explore,
>>> but we believe that there are benefits to adopting buildbot build
>>> configuration
>>> regardless.
>>> 
>>> Regards, Krisztian
>>> 
>>> [1]: https://github.com/ursa-labs/ursabot
>>> [2]: https://buildbot.net
>>>    https://docs.buildbot.net
>>>    https://github.com/buildbot/buildbot
>>> [3]: https://github.com/apache/arrow/pull/5210
>> 
>>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

Hey Uwe,

On Thu, Sep 5, 2019 at 1:49 PM Uwe L. Korn <uw...@xhochy.com> wrote:

> Hello Krisztián,
>
> I like this proposal. CI coverage and response time is a crucial thing for
> the health of the project. In general I like the consolidation and local
> reproducibility of tge builds. Some questions I wanted to ask to make sure
> I understand your proposal correctly (hopefully they all can be answered
> with a simple yes):
>
> * Windows builds will stay in Appveyor for now?
>
Yes. Afterwards I'd go with the following steps:
1. Port the AppVeyor configurations to buildbot and run them on
    AppVeyor with `ursabot project build windows-builder-name`
2. Once we have windows workers, and they are reliable, we can
    decommission the AppVeyor builds.

> * MacOS builds will stay in Travis?
>
Yes, same as above.

> * All other builds will be removed from Travis?

Not all of the Travis builds are ported to buildbot yet, namely: c_glib,
ruby, and the format integration tests.
I suggest an incremental procedure, if the travis build is ported to
buildbot, we can choose to still run it on travis or we can choose
disable it. In this case Travis would only be a hosting provider.

> * Machines are currently run and funded by UrsaLabs but others could also
> sponsor an instance that could be added to the setup?
>
Exactly, either in the cloud or a bare machines, buildbot enables
us to scale our cluster pretty easily.

> * The build configuration is automatically updated on a merge to master?
>
Not yet, but this can be automatized too with buildbot itself.

>
> And then a not so simple one: What will happen to our current
> docker-compose setup? From the PR it seems like we do similar things with
> ursabot but not using the central docker-compose.yml?
>
Currently we're using docker-compose to run one-off containers rather
than long running, multi-container services (which docker-compose is
designed for). Ursabot already supports the features we need from
docker-compose, so it can effectively replace the docker-compose
setup as well. We have low-level control over the docker API, so we
are able to tailor it to our requirements.

>
>
> Cheers
> Uwe
>
> > Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs <
> szucs.krisztian@gmail.com>:
> >
> > Hi,
> >
> > Arrow's current continuous integration setup utilizes multiple CI
> > providers,
> > tools, and scripts:
> >
> > - Unit tests are running on Travis and Appveyor
> > - Binary packaging builds are running on crossbow, an abstraction over
> > multiple
> >   CI providers driven through a GitHub repository
> > - For local tests and tasks, there is a docker-compose setup, or of
> course
> > you
> >   can maintain your own environment
> >
> > This setup has run into some limitations:
> > - It’s slow: the CI parallelism of Travis has degraded over the last
> > couple of
> >   months. Testing a PR takes more than an hour, which is a long time for
> > both
> >   the maintainers and the contributors, and it has a negative effect on
> > the
> >   development throughput.
> > - Build configurations are not portable, they are tied to specific
> > services.
> >   You can’t just take a Travis script and run it somewhere else.
> > - Because they’re not portable, build configurations are duplicated in
> > several
> >   places.
> > - The Travis, Appveyor and crossbow builds are not reproducible locally,
> > so
> >   developing them requires the slow git push cycles.
> > - Public CI has limited platform support, just for example ARM machines
> > are
> >   not available.
> > - Public CI also has limited hardware support, no GPUs are available
> >
> > Resolving all of the issues above is complicated, but is a must for the
> > long
> > term sustainability of Arrow.
> >
> > For some time, we’ve been working on a tool called Ursabot[1], a library
> on
> > top
> > of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> > used
> > for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> > Buildbot
> > is not another hosted CI service like Travis or Appveyor: it is an
> > extensible
> > framework to implement various automations like continuous integration
> > tasks.
> >
> > You’ve probably noticed additional “Ursabot” builds appearing on pull
> > requests,
> > in addition to the Travis and Appveyor builds. We’ve been testing the
> > framework
> > with a fully featured CI server at ci.ursalabs.org. This service runs
> build
> > configurations we can’t run on Travis, does it faster than Travis, and
> has
> > the
> > GitHub comment bot integration for ad hoc build triggering.
> >
> > While we’re not prepared to propose moving all CI to a self-hosted setup,
> > our
> > work has demonstrated the potential of using buildbot to resolve Arrow’s
> > continuous integration challenges:
> > - The docker-based builders are reusing the docker images, which
> eliminate
> >   slow dependency installation steps. Some builds on this setup, run on
> >   Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
> >   Travis-CI jobs.
> > - It’s scalable. We can deploy buildbot wherever and add more masters and
> >   workers, which we can’t do with public CI.
> > - It’s platform and CI-provider independent. Builds can be run on
> > arbitrary
> >   architectures, operating systems, and hardware: Python is the only
> >   requirement. Additionally builds specified in buildbot/ursabot can be
> > run
> >   anywhere: not only on custom buildbot infrastructure but also on
> Travis,
> > or
> >   even on your own machine.
> > - It improves reproducibility and encourages consolidation of
> > configuration.
> >   You can run the exact job locally that ran on Travis, and you can even
> > get
> >   an interactive shell in the build so you can debug a test failure. And
> >   because you can run the same job anywhere, we wouldn’t need to have
> >   duplicated, Travis-specific or the docker-compose build configuration
> > stored
> >   separately.
> > - It’s extensible. More exotic features like a comment bot, benchmark
> >   database, benchmark dashboard, artifact store, integrating other
> systems
> > are
> >   easily implementable within the same system.
> >
> > I’m proposing to donate the build configuration we’ve been iterating on
> in
> > Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> > configuration.
> > This will enable us to explore consolidating build configuration using
> the
> > buildbot framework. A next step after to explore that would be to port a
> > Travis
> > build to Ursabot, and in the Travis configuration, execute the build by
> the
> > shell command `$ ursabot project build <builder-name>`. This is the same
> > way we
> > would be able to execute the build locally--something we can’t currently
> do
> > with the Travis builds.
> >
> > I am not proposing here that we stop using Travis-CI and Appveyor to run
> CI
> > for
> > apache/arrow, though that may well be a direction we choose to go in the
> > future. Moving build configuration into something like buildbot would be
> a
> > necessary first step to do that; that said, there are other immediate
> > benefits
> > to be had by porting build configuration into buildbot: local
> > reproducibility,
> > consolidation of build logic, independence from a particular CI provider,
> > and
> > ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> > brings
> > a number of other challenges, which we will concurrently continue to
> > explore,
> > but we believe that there are benefits to adopting buildbot build
> > configuration
> > regardless.
> >
> > Regards, Krisztian
> >
> > [1]: https://github.com/ursa-labs/ursabot
> > [2]: https://buildbot.net
> >     https://docs.buildbot.net
> >     https://github.com/buildbot/buildbot
> > [3]: https://github.com/apache/arrow/pull/5210
>
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by "Uwe L. Korn" <uw...@xhochy.com>.

Hello Krisztián, 

I like this proposal. CI coverage and response time is a crucial thing for the health of the project. In general I like the consolidation and local reproducibility of tge builds. Some questions I wanted to ask to make sure I understand your proposal correctly (hopefully they all can be answered with a simple yes):

* Windows builds will stay in Appveyor for now?
* MacOS builds will stay in Travis?
* All other builds will be removed from Travis?
* Machines are currently run and funded by UrsaLabs but others could also sponsor an instance that could be added to the setup?
* The build configuration is automatically updated on a merge to master?

And then a not so simple one: What will happen to our current docker-compose setup? From the PR it seems like we do similar things with ursabot but not using the central docker-compose.yml?


Cheers
Uwe

> Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs <sz...@gmail.com>:
> 
> Hi,
> 
> Arrow's current continuous integration setup utilizes multiple CI
> providers,
> tools, and scripts:
> 
> - Unit tests are running on Travis and Appveyor
> - Binary packaging builds are running on crossbow, an abstraction over
> multiple
>   CI providers driven through a GitHub repository
> - For local tests and tasks, there is a docker-compose setup, or of course
> you
>   can maintain your own environment
> 
> This setup has run into some limitations:
> - It’s slow: the CI parallelism of Travis has degraded over the last
> couple of
>   months. Testing a PR takes more than an hour, which is a long time for
> both
>   the maintainers and the contributors, and it has a negative effect on
> the
>   development throughput.
> - Build configurations are not portable, they are tied to specific
> services.
>   You can’t just take a Travis script and run it somewhere else.
> - Because they’re not portable, build configurations are duplicated in
> several
>   places.
> - The Travis, Appveyor and crossbow builds are not reproducible locally,
> so
>   developing them requires the slow git push cycles.
> - Public CI has limited platform support, just for example ARM machines
> are
>   not available.
> - Public CI also has limited hardware support, no GPUs are available
> 
> Resolving all of the issues above is complicated, but is a must for the
> long
> term sustainability of Arrow.
> 
> For some time, we’ve been working on a tool called Ursabot[1], a library on
> top
> of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> used
> for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> Buildbot
> is not another hosted CI service like Travis or Appveyor: it is an
> extensible
> framework to implement various automations like continuous integration
> tasks.
> 
> You’ve probably noticed additional “Ursabot” builds appearing on pull
> requests,
> in addition to the Travis and Appveyor builds. We’ve been testing the
> framework
> with a fully featured CI server at ci.ursalabs.org. This service runs build
> configurations we can’t run on Travis, does it faster than Travis, and has
> the
> GitHub comment bot integration for ad hoc build triggering.
> 
> While we’re not prepared to propose moving all CI to a self-hosted setup,
> our
> work has demonstrated the potential of using buildbot to resolve Arrow’s
> continuous integration challenges:
> - The docker-based builders are reusing the docker images, which eliminate
>   slow dependency installation steps. Some builds on this setup, run on
>   Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
>   Travis-CI jobs.
> - It’s scalable. We can deploy buildbot wherever and add more masters and
>   workers, which we can’t do with public CI.
> - It’s platform and CI-provider independent. Builds can be run on
> arbitrary
>   architectures, operating systems, and hardware: Python is the only
>   requirement. Additionally builds specified in buildbot/ursabot can be
> run
>   anywhere: not only on custom buildbot infrastructure but also on Travis,
> or
>   even on your own machine.
> - It improves reproducibility and encourages consolidation of
> configuration.
>   You can run the exact job locally that ran on Travis, and you can even
> get
>   an interactive shell in the build so you can debug a test failure. And
>   because you can run the same job anywhere, we wouldn’t need to have
>   duplicated, Travis-specific or the docker-compose build configuration
> stored
>   separately.
> - It’s extensible. More exotic features like a comment bot, benchmark
>   database, benchmark dashboard, artifact store, integrating other systems
> are
>   easily implementable within the same system.
> 
> I’m proposing to donate the build configuration we’ve been iterating on in
> Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> configuration.
> This will enable us to explore consolidating build configuration using the
> buildbot framework. A next step after to explore that would be to port a
> Travis
> build to Ursabot, and in the Travis configuration, execute the build by the
> shell command `$ ursabot project build <builder-name>`. This is the same
> way we
> would be able to execute the build locally--something we can’t currently do
> with the Travis builds.
> 
> I am not proposing here that we stop using Travis-CI and Appveyor to run CI
> for
> apache/arrow, though that may well be a direction we choose to go in the
> future. Moving build configuration into something like buildbot would be a
> necessary first step to do that; that said, there are other immediate
> benefits
> to be had by porting build configuration into buildbot: local
> reproducibility,
> consolidation of build logic, independence from a particular CI provider,
> and
> ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> brings
> a number of other challenges, which we will concurrently continue to
> explore,
> but we believe that there are benefits to adopting buildbot build
> configuration
> regardless.
> 
> Regards, Krisztian
> 
> [1]: https://github.com/ursa-labs/ursabot
> [2]: https://buildbot.net
>     https://docs.buildbot.net
>     https://github.com/buildbot/buildbot
> [3]: https://github.com/apache/arrow/pull/5210

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Sat, Sep 7, 2019 at 9:54 AM Sutou Kouhei <ko...@clear-code.com> wrote:

> Hi,
>
> I may have Ursabot experience because I've tried to create a
> Ursabot configuration for GLib:
>
>   https://github.com/ursa-labs/ursabot/pull/172

Which is great, thanks for doing that!

>
>
> I like the proposal to consolidate CI configuration into
> Arrow repository. But I like the current docker-compose
> based approach to write how to run each CI job.
>
> I know that Krisztián pointed out docker-compose based
> approach has a problem in Docker image dependency
> resolution.
>
>
> https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E
>
> > The "docker-compose setup"
> > --------------------------
> > ...
> > However docker-compose is not suitable for building and running
> > hierarchical
> > images. This is why we have added Makefile [1] to execute a "build" with
> a
> > single make command instead of manually executing multiple commands
> > involving
> > multiple images (which is error prone). It can also leave a lot of
> garbage
> > after both containers and images.
> > ...
> > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
>
> But I felt that I want to use well used approach than our
> specific Python based DSL while I created c_glib
> configuration for Ursabot. If we can use well used approach,
> we can use the approach in other projects. It means that we
> can decrease learning cost.
>
> I also felt that creating a class for each command for
> readable DSL is over-engineering. I know that I could use
> raw ShellCommand class but it'll break consistency.

I've added those command aliases really just for convenience, and to
not forget to customise them a bit. Each command can be customized
to parse e.g. number of failing/warning/succeded test cases from
a step and create a summary - which can greatly improve the readability
of the build log. Set different behaviours for different states, they can
use
locks across the whole CI, and other dynamic things can be done, like
triggering another schedulers.
These commands are not shell commands, we can represent more with
the buildbot build steps than with shell scripts. The conversion would also
work from buildbot BuildSteps to bash scripts by mocking out the non
ShellCommand steps. Thus buildbot DSL can be executed as a shell script
however with a shell script we cannot represent certain logics, which
would be useful for the hosted build master.

>
> For example:
>
>   Creating Meson class to run meson command:
>
> https://github.com/ursa-labs/ursabot/pull/172/files#diff-663dab3e9eab42dfac85d2fdb69c7e95R313-R315
>
> How about just creating a wrapper script for docker-compose
> instead of creating DSL?
>
I've also tried to figure out a way to reuse the bits from the
docker-compose
setup, but after some time I've realised that it'd be easier to generate
bash scripts and docker-compose.yml from the buildbot DSL because it
represents more abstractions.
Additionally docker-compose was not convenient for first use either, it took
a couple of iterations to reach the current state which balances between
the limitations of docker-compose and Arrow's requirements.
While docker-compose and the docker builders would work with linux and
windows builds, other platforms would fall short.

>
> For example, we will be able to use labels [labels] to put
> metadata to each image:
>
> ----
> diff --git a/arrow-docker-compose b/arrow-docker-compose
> new file mode 100755
> index 000000000..fcb7f5e37
> --- /dev/null
> +++ b/arrow-docker-compose
> @@ -0,0 +1,13 @@
> +#!/usr/bin/env ruby
> +
> +require "yaml"
> +
> +if ARGV == ["build", "c_glib"]
> +  config = YAML.load(File.read("docker-compose.yml"))
> +  from =
> config["services"]["c_glib"]["build"]["labels"]["org.apache.arrow.from"]
> +  if from
> +    system("docker-compose", "build", from)
> +  end
> +end
> +system("docker-compose", *ARGV)
> diff --git a/docker-compose.yml b/docker-compose.yml
> index 4f3f4128a..acd649a19 100644
> --- a/docker-compose.yml
> +++ b/docker-compose.yml
> @@ -103,6 +103,8 @@ services:
>      build:
>        context: .
>        dockerfile: c_glib/Dockerfile
> +      labels:
> +        "org.apache.arrow.from": cpp
>      volumes: *ubuntu-volumes
>
>    cpp:
> ----
>
> "./arrow-docker-compose build c_glib" runs
> "docker-compose build cpp" then
> "docker-compose build c_glib".
>
> [labels] https://docs.docker.com/compose/compose-file/#labels
>
>
> If we just have convenient docker-compose wrapper, can we
> use raw Buildbot that just runs the docker-compose wrapper?
>
> I also know that Krisztián pointed out using docker-compose
> from Buildbot approach has some problems.
>
>
> https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E
>
> > Use docker-compose from ursabot?
> > --------------------------------
> >
> > So assume that we should use docker-compose commands in the buildbot
> > builders.
> > Then:
> > - there would be a single build step for all builders [2] (which means a
> >   single chunk of unreadable log) - it also complicates working with
> > esoteric
> >   builders like the on-demand crossbow trigger and the benchmark runner
> > - no possibility to customize the buildsteps (like aggregating the count
> of
> >   warnings)
> > - no time statistics for the steps which would make it harder to optimize
> > the
> >   build times
> > - to properly clean up the container some custom solution would be
> required
> > - if we'd need to introduce additional parametrizations to the
> >   docker-compose.yaml (for example to add other architectures) then it
> might
> >   require full yaml duplication
> > - exchanging data between the docker-compose container and builtbot
> would be
> >   more complicated, for example the benchmark comment reporter reads
> >   the result from a file, in order to do the same (reading structured
> > output on
> >   stdout and stderr from scripts is more error prone) mounted volumes are
> >   required, which brings the usual permission problems on linux.
> > - local reproducibility still requires manual intervention because the
> > scripts
> >   within the docker containers are not pausable, they exit and the steps
> > until
> >   the failed one must be re-executed* after ssh-ing into the running
> > container.
> > ...
> > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
>
> We can use "tail -f /dev/null", "docker-compose up -d cpp"
> and "docker-compose exec cpp .." to run commands step by
> step in container. It'll solve "single build step" related
> problems:
>
> Actually Buildbot's docker builder works similarly, it spins up an
image, starts a Buildbot worker inside, and instrument it from
outside.

> ---
> diff --git a/docker-compose.yml b/docker-compose.yml
> index 4f3f4128a..6b3218f5e 100644
> --- a/docker-compose.yml
> +++ b/docker-compose.yml
> @@ -114,6 +114,7 @@ services:
>      build:
>        context: .
>        dockerfile: cpp/Dockerfile
> +    command: tail -f /dev/null
>      volumes: *ubuntu-volumes
>
>    cpp-system-deps:
> ----
>
> ----
> % docker-compose up -d cpp
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> Creating network "arrowkou_default" with the default driver
> Creating arrowkou_cpp_1 ... done
> % docker-compose exec cpp sh -c 'echo hello > /tmp/hello.txt'
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> % docker-compose exec cpp cat /tmp/hello.txt
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> hello
> % docker-compose down
> WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank
> string.
> WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank
> string.
> Stopping arrowkou_cpp_1 ... done
> Removing arrowkou_cpp_1 ... done
> Removing network arrowkou_default
> ----
>
I'm not saying that we couldn't nor shouldn't invest time into a
docker-compose
wrapper or perhaps a docker-compose generator, but this problem definitely
have a couple of angles to view from.

BTW I'm not sure how much time it took you to get familiar with the
buildbot
DSL, but your PR is just good as is, closely aligns with the previous
configs.

Thanks Kou!

>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "[PROPOSAL] Consolidate Arrow's CI configuration" on Thu, 29 Aug 2019
> 14:19:16 +0200,
>   Krisztián Szűcs <sz...@gmail.com> wrote:
>
> > Hi,
> >
> > Arrow's current continuous integration setup utilizes multiple CI
> > providers,
> > tools, and scripts:
> >
> >  - Unit tests are running on Travis and Appveyor
> >  - Binary packaging builds are running on crossbow, an abstraction over
> > multiple
> >    CI providers driven through a GitHub repository
> >  - For local tests and tasks, there is a docker-compose setup, or of
> course
> > you
> >    can maintain your own environment
> >
> > This setup has run into some limitations:
> >  - It’s slow: the CI parallelism of Travis has degraded over the last
> > couple of
> >    months. Testing a PR takes more than an hour, which is a long time for
> > both
> >    the maintainers and the contributors, and it has a negative effect on
> > the
> >    development throughput.
> >  - Build configurations are not portable, they are tied to specific
> > services.
> >    You can’t just take a Travis script and run it somewhere else.
> >  - Because they’re not portable, build configurations are duplicated in
> > several
> >    places.
> >  - The Travis, Appveyor and crossbow builds are not reproducible locally,
> > so
> >    developing them requires the slow git push cycles.
> >  - Public CI has limited platform support, just for example ARM machines
> > are
> >    not available.
> >  - Public CI also has limited hardware support, no GPUs are available
> >
> > Resolving all of the issues above is complicated, but is a must for the
> > long
> > term sustainability of Arrow.
> >
> > For some time, we’ve been working on a tool called Ursabot[1], a library
> on
> > top
> > of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> > used
> > for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> > Buildbot
> > is not another hosted CI service like Travis or Appveyor: it is an
> > extensible
> > framework to implement various automations like continuous integration
> > tasks.
> >
> > You’ve probably noticed additional “Ursabot” builds appearing on pull
> > requests,
> > in addition to the Travis and Appveyor builds. We’ve been testing the
> > framework
> > with a fully featured CI server at ci.ursalabs.org. This service runs
> build
> > configurations we can’t run on Travis, does it faster than Travis, and
> has
> > the
> > GitHub comment bot integration for ad hoc build triggering.
> >
> > While we’re not prepared to propose moving all CI to a self-hosted setup,
> > our
> > work has demonstrated the potential of using buildbot to resolve Arrow’s
> > continuous integration challenges:
> >  - The docker-based builders are reusing the docker images, which
> eliminate
> >    slow dependency installation steps. Some builds on this setup, run on
> >    Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
> >    Travis-CI jobs.
> >  - It’s scalable. We can deploy buildbot wherever and add more masters
> and
> >    workers, which we can’t do with public CI.
> >  - It’s platform and CI-provider independent. Builds can be run on
> > arbitrary
> >    architectures, operating systems, and hardware: Python is the only
> >    requirement. Additionally builds specified in buildbot/ursabot can be
> > run
> >    anywhere: not only on custom buildbot infrastructure but also on
> Travis,
> > or
> >    even on your own machine.
> >  - It improves reproducibility and encourages consolidation of
> > configuration.
> >    You can run the exact job locally that ran on Travis, and you can even
> > get
> >    an interactive shell in the build so you can debug a test failure. And
> >    because you can run the same job anywhere, we wouldn’t need to have
> >    duplicated, Travis-specific or the docker-compose build configuration
> > stored
> >    separately.
> >  - It’s extensible. More exotic features like a comment bot, benchmark
> >    database, benchmark dashboard, artifact store, integrating other
> systems
> > are
> >    easily implementable within the same system.
> >
> > I’m proposing to donate the build configuration we’ve been iterating on
> in
> > Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> > configuration.
> > This will enable us to explore consolidating build configuration using
> the
> > buildbot framework. A next step after to explore that would be to port a
> > Travis
> > build to Ursabot, and in the Travis configuration, execute the build by
> the
> > shell command `$ ursabot project build <builder-name>`. This is the same
> > way we
> > would be able to execute the build locally--something we can’t currently
> do
> > with the Travis builds.
> >
> > I am not proposing here that we stop using Travis-CI and Appveyor to run
> CI
> > for
> > apache/arrow, though that may well be a direction we choose to go in the
> > future. Moving build configuration into something like buildbot would be
> a
> > necessary first step to do that; that said, there are other immediate
> > benefits
> > to be had by porting build configuration into buildbot: local
> > reproducibility,
> > consolidation of build logic, independence from a particular CI provider,
> > and
> > ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> > brings
> > a number of other challenges, which we will concurrently continue to
> > explore,
> > but we believe that there are benefits to adopting buildbot build
> > configuration
> > regardless.
> >
> > Regards, Krisztian
> >
> > [1]: https://github.com/ursa-labs/ursabot
> > [2]: https://buildbot.net
> >      https://docs.buildbot.net
> >      https://github.com/buildbot/buildbot
> > [3]: https://github.com/apache/arrow/pull/5210
>

Re: [PROPOSAL] Consolidate Arrow's CI configuration

Posted by Sutou Kouhei <ko...@clear-code.com>.

Hi,

I may have Ursabot experience because I've tried to create a
Ursabot configuration for GLib:

  https://github.com/ursa-labs/ursabot/pull/172

I like the proposal to consolidate CI configuration into
Arrow repository. But I like the current docker-compose
based approach to write how to run each CI job.

I know that Krisztián pointed out docker-compose based
approach has a problem in Docker image dependency
resolution.

  https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E

> The "docker-compose setup"
> --------------------------
> ...
> However docker-compose is not suitable for building and running
> hierarchical
> images. This is why we have added Makefile [1] to execute a "build" with a
> single make command instead of manually executing multiple commands
> involving
> multiple images (which is error prone). It can also leave a lot of garbage
> after both containers and images.
> ...
> [1]: https://github.com/apache/arrow/blob/master/Makefile.docker

But I felt that I want to use well used approach than our
specific Python based DSL while I created c_glib
configuration for Ursabot. If we can use well used approach,
we can use the approach in other projects. It means that we
can decrease learning cost.

I also felt that creating a class for each command for
readable DSL is over-engineering. I know that I could use
raw ShellCommand class but it'll break consistency.

For example:

  Creating Meson class to run meson command:
  https://github.com/ursa-labs/ursabot/pull/172/files#diff-663dab3e9eab42dfac85d2fdb69c7e95R313-R315

How about just creating a wrapper script for docker-compose
instead of creating DSL?

For example, we will be able to use labels [labels] to put
metadata to each image:

----
diff --git a/arrow-docker-compose b/arrow-docker-compose
new file mode 100755
index 000000000..fcb7f5e37
--- /dev/null
+++ b/arrow-docker-compose
@@ -0,0 +1,13 @@
+#!/usr/bin/env ruby
+
+require "yaml"
+
+if ARGV == ["build", "c_glib"]
+  config = YAML.load(File.read("docker-compose.yml"))
+  from = config["services"]["c_glib"]["build"]["labels"]["org.apache.arrow.from"]
+  if from
+    system("docker-compose", "build", from)
+  end
+end
+system("docker-compose", *ARGV)
diff --git a/docker-compose.yml b/docker-compose.yml
index 4f3f4128a..acd649a19 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -103,6 +103,8 @@ services:
     build:
       context: .
       dockerfile: c_glib/Dockerfile
+      labels:
+        "org.apache.arrow.from": cpp
     volumes: *ubuntu-volumes
 
   cpp:
----

"./arrow-docker-compose build c_glib" runs
"docker-compose build cpp" then
"docker-compose build c_glib".

[labels] https://docs.docker.com/compose/compose-file/#labels


If we just have convenient docker-compose wrapper, can we
use raw Buildbot that just runs the docker-compose wrapper?

I also know that Krisztián pointed out using docker-compose
from Buildbot approach has some problems.

  https://lists.apache.org/thread.html/fd801fa85c3393edd0db415d70dbc4c3537a811ec8587a6fbcc842cd@%3Cdev.arrow.apache.org%3E

> Use docker-compose from ursabot?
> --------------------------------
>
> So assume that we should use docker-compose commands in the buildbot
> builders.
> Then:
> - there would be a single build step for all builders [2] (which means a
>   single chunk of unreadable log) - it also complicates working with
> esoteric
>   builders like the on-demand crossbow trigger and the benchmark runner
> - no possibility to customize the buildsteps (like aggregating the count of
>   warnings)
> - no time statistics for the steps which would make it harder to optimize
> the
>   build times
> - to properly clean up the container some custom solution would be required
> - if we'd need to introduce additional parametrizations to the
>   docker-compose.yaml (for example to add other architectures) then it might
>   require full yaml duplication
> - exchanging data between the docker-compose container and builtbot would be
>   more complicated, for example the benchmark comment reporter reads
>   the result from a file, in order to do the same (reading structured
> output on
>   stdout and stderr from scripts is more error prone) mounted volumes are
>   required, which brings the usual permission problems on linux.
> - local reproducibility still requires manual intervention because the
> scripts
>   within the docker containers are not pausable, they exit and the steps
> until
>   the failed one must be re-executed* after ssh-ing into the running
> container.
> ...
> [2]: https://ci.ursalabs.org/#/builders/87/builds/929

We can use "tail -f /dev/null", "docker-compose up -d cpp"
and "docker-compose exec cpp .." to run commands step by
step in container. It'll solve "single build step" related
problems:

---
diff --git a/docker-compose.yml b/docker-compose.yml
index 4f3f4128a..6b3218f5e 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -114,6 +114,7 @@ services:
     build:
       context: .
       dockerfile: cpp/Dockerfile
+    command: tail -f /dev/null
     volumes: *ubuntu-volumes
 
   cpp-system-deps:
----

----
% docker-compose up -d cpp
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
Creating network "arrowkou_default" with the default driver
Creating arrowkou_cpp_1 ... done
% docker-compose exec cpp sh -c 'echo hello > /tmp/hello.txt'
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
% docker-compose exec cpp cat /tmp/hello.txt
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
hello
% docker-compose down
WARNING: The CI_ARROW_SHA variable is not set. Defaulting to a blank string.
WARNING: The CI_ARROW_BRANCH variable is not set. Defaulting to a blank string.
Stopping arrowkou_cpp_1 ... done
Removing arrowkou_cpp_1 ... done
Removing network arrowkou_default
----


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "[PROPOSAL] Consolidate Arrow's CI configuration" on Thu, 29 Aug 2019 14:19:16 +0200,
  Krisztián Szűcs <sz...@gmail.com> wrote:

> Hi,
> 
> Arrow's current continuous integration setup utilizes multiple CI
> providers,
> tools, and scripts:
> 
>  - Unit tests are running on Travis and Appveyor
>  - Binary packaging builds are running on crossbow, an abstraction over
> multiple
>    CI providers driven through a GitHub repository
>  - For local tests and tasks, there is a docker-compose setup, or of course
> you
>    can maintain your own environment
> 
> This setup has run into some limitations:
>  - It’s slow: the CI parallelism of Travis has degraded over the last
> couple of
>    months. Testing a PR takes more than an hour, which is a long time for
> both
>    the maintainers and the contributors, and it has a negative effect on
> the
>    development throughput.
>  - Build configurations are not portable, they are tied to specific
> services.
>    You can’t just take a Travis script and run it somewhere else.
>  - Because they’re not portable, build configurations are duplicated in
> several
>    places.
>  - The Travis, Appveyor and crossbow builds are not reproducible locally,
> so
>    developing them requires the slow git push cycles.
>  - Public CI has limited platform support, just for example ARM machines
> are
>    not available.
>  - Public CI also has limited hardware support, no GPUs are available
> 
> Resolving all of the issues above is complicated, but is a must for the
> long
> term sustainability of Arrow.
> 
> For some time, we’ve been working on a tool called Ursabot[1], a library on
> top
> of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> used
> for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> Buildbot
> is not another hosted CI service like Travis or Appveyor: it is an
> extensible
> framework to implement various automations like continuous integration
> tasks.
> 
> You’ve probably noticed additional “Ursabot” builds appearing on pull
> requests,
> in addition to the Travis and Appveyor builds. We’ve been testing the
> framework
> with a fully featured CI server at ci.ursalabs.org. This service runs build
> configurations we can’t run on Travis, does it faster than Travis, and has
> the
> GitHub comment bot integration for ad hoc build triggering.
> 
> While we’re not prepared to propose moving all CI to a self-hosted setup,
> our
> work has demonstrated the potential of using buildbot to resolve Arrow’s
> continuous integration challenges:
>  - The docker-based builders are reusing the docker images, which eliminate
>    slow dependency installation steps. Some builds on this setup, run on
>    Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
>    Travis-CI jobs.
>  - It’s scalable. We can deploy buildbot wherever and add more masters and
>    workers, which we can’t do with public CI.
>  - It’s platform and CI-provider independent. Builds can be run on
> arbitrary
>    architectures, operating systems, and hardware: Python is the only
>    requirement. Additionally builds specified in buildbot/ursabot can be
> run
>    anywhere: not only on custom buildbot infrastructure but also on Travis,
> or
>    even on your own machine.
>  - It improves reproducibility and encourages consolidation of
> configuration.
>    You can run the exact job locally that ran on Travis, and you can even
> get
>    an interactive shell in the build so you can debug a test failure. And
>    because you can run the same job anywhere, we wouldn’t need to have
>    duplicated, Travis-specific or the docker-compose build configuration
> stored
>    separately.
>  - It’s extensible. More exotic features like a comment bot, benchmark
>    database, benchmark dashboard, artifact store, integrating other systems
> are
>    easily implementable within the same system.
> 
> I’m proposing to donate the build configuration we’ve been iterating on in
> Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> configuration.
> This will enable us to explore consolidating build configuration using the
> buildbot framework. A next step after to explore that would be to port a
> Travis
> build to Ursabot, and in the Travis configuration, execute the build by the
> shell command `$ ursabot project build <builder-name>`. This is the same
> way we
> would be able to execute the build locally--something we can’t currently do
> with the Travis builds.
> 
> I am not proposing here that we stop using Travis-CI and Appveyor to run CI
> for
> apache/arrow, though that may well be a direction we choose to go in the
> future. Moving build configuration into something like buildbot would be a
> necessary first step to do that; that said, there are other immediate
> benefits
> to be had by porting build configuration into buildbot: local
> reproducibility,
> consolidation of build logic, independence from a particular CI provider,
> and
> ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> brings
> a number of other challenges, which we will concurrently continue to
> explore,
> but we believe that there are benefits to adopting buildbot build
> configuration
> regardless.
> 
> Regards, Krisztian
> 
> [1]: https://github.com/ursa-labs/ursabot
> [2]: https://buildbot.net
>      https://docs.buildbot.net
>      https://github.com/buildbot/buildbot
> [3]: https://github.com/apache/arrow/pull/5210