You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Krisztián Szűcs <sz...@gmail.com> on 2019/08/08 14:12:20 UTC

Re: Ursabot configuration within Arrow

Hi All!

Ursabot now supports debugging failed builds with attach attaching
shells to the still builds right after a failing build step:

$ ursabot project build --attach-on-failure `AMD64 Conda C++`

And local source/git directories can also be mounted to the builder
instead of cloning arrow, this makes the debugging a lot easier:

$ ursabot project build -s ~/Workspace/arrow:. 'AMD64 Conda C++'

Mount destination `.` is relative to the build directory on the workers.

The CI configuration for arrow is available here:
https://github.com/ursa-labs/ursabot/tree/master/projects/arrow

I'd like to proceed to the code donation, but not sure what steps are
required.
I'd also like to receive feedback from other members, because this change
would heavily affect the future of arrow's continuous integration.

Regards, Krisztian

On Wed, Jul 31, 2019 at 8:59 PM Krisztián Szűcs <sz...@gmail.com>
wrote:

> We can now reproduce the builds locally (without the need of
> the web UI) with a single command:
>
> To demonstrate, building the master barnch and building a pull
> request requires the following commands:
>
> $ ursabot project build 'AMD64 Ubuntu 18.04 C++'
>
> $ ursabot project build -pr <num> 'AMD64 Ubuntu 18.04 C++'
>
> See the output here:
> https://travis-ci.org/ursa-labs/ursabot/builds/566057077#L988
>
> This effectively means, that the builders defined in ursabot
> can be directly runned on machones or CI services which have
> docker installed (with a single command).
> It also replaces the need of the docker-compose setup.
>
> I'm going to write some documentation and prepare the arrow
> builders for a donation to the arrow codebase (which of course
> requires a vote).
>
> If anyone has a question please don't hesitate to ask!
>
> Regards, Krisztian
>
>
> On Tue, Jul 30, 2019 at 4:45 PM Krisztián Szűcs <sz...@gmail.com>
> wrote:
>
>> Ok, but the configuration movement to arrow is orthogonal to
>> the local reproducibility feature. Could we proceed with that?
>>
>> On Tue, Jul 30, 2019 at 4:38 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> I will defer to others to investigate this matter further but I would
>>> really like to see a concrete and practical path to local
>>> reproducibility before moving forward on any changes to our current
>>> CI.
>>>
>>> On Tue, Jul 30, 2019 at 7:38 AM Krisztián Szűcs
>>> <sz...@gmail.com> wrote:
>>> >
>>> > Fixed it and restarted a bunch of builds.
>>> >
>>> > On Tue, Jul 30, 2019 at 5:13 AM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> >
>>> > > By the way, can you please disable the Buildbot builders that are
>>> > > causing builds on master to fail? We haven't had a passing build in
>>> > > over a week. Until we reconcile the build configurations we shouldn't
>>> > > be failing contributors' builds
>>> > >
>>> > > On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs
>>> > > > <sz...@gmail.com> wrote:
>>> > > > >
>>> > > > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <
>>> wesmckinn@gmail.com>
>>> > > wrote:
>>> > > > >
>>> > > > > > hi Krisztian,
>>> > > > > >
>>> > > > > > Before talking about any code donations or where to run
>>> builds, I
>>> > > > > > think we first need to discuss the worrisome situation where
>>> we have
>>> > > > > > in some cases 3 (or more) CI configurations for different
>>> components
>>> > > > > > in the project.
>>> > > > > >
>>> > > > > > Just taking into account out C++ build, we have:
>>> > > > > >
>>> > > > > > * A config for Travis CI
>>> > > > > > * Multiple configurations in Dockerfiles under cpp/
>>> > > > > > * A brand new (?) configuration in this third party
>>> ursa-labs/ursabot
>>> > > > > > repository
>>> > > > > >
>>> > > > > > I note for example that the "AMD64 Conda C++" Buildbot build is
>>> > > > > > failing while Travis CI is succeeding
>>> > > > > >
>>> > > > > > https://ci.ursalabs.org/#builders/66/builds/3196
>>> > > > > >
>>> > > > > > Starting from first principles, at least for Linux-based
>>> builds, what
>>> > > > > > I would like to see is:
>>> > > > > >
>>> > > > > > * A single build configuration (which can be driven by
>>> yaml-based
>>> > > > > > configuration files and environment variables), rather than 3
>>> like we
>>> > > > > > have now. This build configuration should be decoupled from
>>> any CI
>>> > > > > > platform, including Travis CI and Buildbot
>>> > > > > >
>>> > > > > Yeah, this would be the ideal setup, but I'm afraid the
>>> situation is a
>>> > > bit
>>> > > > > more complicated.
>>> > > > >
>>> > > > > TravisCI
>>> > > > > --------
>>> > > > >
>>> > > > > constructed from a bunch of scripts optimized for travis, this
>>> setup is
>>> > > > > slow
>>> > > > > and hardly compatible with any of the remaining setups.
>>> > > > > I think we should ditch it.
>>> > > > >
>>> > > > > The "docker-compose setup"
>>> > > > > --------------------------
>>> > > > >
>>> > > > > Most of the Dockerfiles are part of the  docker-compose setup
>>> we've
>>> > > > > developed.
>>> > > > > This might be a good candidate as the tool to centralize around
>>> our
>>> > > future
>>> > > > > setup, mostly because docker-compose is widely used, and we
>>> could setup
>>> > > > > buildbot builders (or any other CI's) to execute the sequence of
>>> > > > > docker-compose
>>> > > > > build and docker-compose run commands.
>>> > > > > However docker-compose is not suitable for building and running
>>> > > > > hierarchical
>>> > > > > images. This is why we have added Makefile [1] to execute a
>>> "build"
>>> > > with a
>>> > > > > single make command instead of manually executing multiple
>>> commands
>>> > > > > involving
>>> > > > > multiple images (which is error prone). It can also leave a lot
>>> of
>>> > > garbage
>>> > > > > after both containers and images.
>>> > > > > Docker-compose shines when one needs to orchestrate multiple
>>> > > containers and
>>> > > > > their networks / volumes on the same machine. We made it work
>>> (with a
>>> > > > > couple of
>>> > > > > hacky workarounds) for arrow though.
>>> > > > > Despite that, I still consider the docker-compose setup a good
>>> > > solution,
>>> > > > > mostly because its biggest advantage, the local reproducibility.
>>> > > > >
>>> > > >
>>> > > > I think what is missing here is an orchestration tool (for
>>> example, a
>>> > > > Python program) to invoke Docker-based development workflows
>>> involving
>>> > > > multiple steps.
>>> > > >
>>> > > > > Ursabot
>>> > > > > -------
>>> > > > >
>>> > > > > Ursabot uses low level docker commands to spin up and down the
>>> > > containers
>>> > > > > and
>>> > > > > it also has a utility to nicely build the hierarchical images
>>> (with
>>> > > much
>>> > > > > less
>>> > > > > maintainable code). The builders are reliable, fast (thanks to
>>> docker)
>>> > > and
>>> > > > > it's
>>> > > > > great so far.
>>> > > > > Where it falls short compared to docker-compose is the lack of
>>> the
>>> > > local
>>> > > > > reproducibility, currently the docker worker cleans up everything
>>> > > after it
>>> > > > > except the mounted volumes for caching. `docker-compose run` is a
>>> > > pretty
>>> > > > > nice
>>> > > > > way to shell into the container.
>>> > > > >
>>> > > > > Use docker-compose from ursabot?
>>> > > > > --------------------------------
>>> > > > >
>>> > > > > So assume that we should use docker-compose commands in the
>>> buildbot
>>> > > > > builders.
>>> > > > > Then:
>>> > > > > - there would be a single build step for all builders [2] (which
>>> means
>>> > > a
>>> > > > >   single chunk of unreadable log) - it also complicates working
>>> with
>>> > > > > esoteric
>>> > > >
>>> > > > I think this is too much of a black-and-white way of looking at
>>> > > > things. What I would like to see is a build orchestration tool,
>>> which
>>> > > > can be used via command line interface, not unlike the current
>>> > > > crossbow.py and archery command line scripts, that can invoke a
>>> build
>>> > > > locally or in a CI setting.
>>> > > >
>>> > > > >   builders like the on-demand crossbow trigger and the benchmark
>>> runner
>>> > > > > - no possibility to customize the buildsteps (like aggregating
>>> the
>>> > > count of
>>> > > > >   warnings)
>>> > > > > - no time statistics for the steps which would make it harder to
>>> > > optimize
>>> > > > > the
>>> > > > >   build times
>>> > > > > - to properly clean up the container some custom solution would
>>> be
>>> > > required
>>> > > > > - if we'd need to introduce additional parametrizations to the
>>> > > > >   docker-compose.yaml (for example to add other architectures)
>>> then it
>>> > > might
>>> > > > >   require full yaml duplication
>>> > > >
>>> > > > I think the tool would need to be higher level than docker-compose
>>> > > >
>>> > > > In general I'm not very comfortable introducing a hard dependency
>>> on
>>> > > > Buildbot (or any CI platform, for that matter) into the project.
>>> So we
>>> > > > have to figure out a way to move forward without such hard
>>> dependency
>>> > > > or go back to the drawing board.
>>> > > >
>>> > > > > - exchanging data between the docker-compose container and
>>> builtbot
>>> > > would be
>>> > > > >   more complicated, for example the benchmark comment reporter
>>> reads
>>> > > > >   the result from a file, in order to do the same (reading
>>> structured
>>> > > > > output on
>>> > > > >   stdout and stderr from scripts is more error prone) mounted
>>> volumes
>>> > > are
>>> > > > >   required, which brings the usual permission problems on linux.
>>> > > > > - local reproducibility still requires manual intervention
>>> because the
>>> > > > > scripts
>>> > > > >   within the docker containers are not pausable, they exit and
>>> the
>>> > > steps
>>> > > > > until
>>> > > > >   the failed one must be re-executed* after ssh-ing into the
>>> running
>>> > > > > container.
>>> > > > >
>>> > > > > Honestly I see more issues than advantages here. Let's see the
>>> other
>>> > > way
>>> > > > > around.
>>> > > > >
>>> > > > > Local reproducibility with ursabot?
>>> > > > > -----------------------------------
>>> > > > >
>>> > > > > The most wanted feature what docker-compose has but ursabot
>>> doesn't is
>>> > > the
>>> > > > > local reproducibility. First of all, ursabot can be run locally,
>>> > > including
>>> > > > > all
>>> > > > > if its builders, so the local reproducibility is partially
>>> resolved.
>>> > > The
>>> > > > > missing piece is the interactive shell into the running
>>> container,
>>> > > because
>>> > > > > buildbot instantly stops and aggressively clean up everything
>>> after the
>>> > > > > container.
>>> > > > >
>>> > > > > I have three solutions / workarounds in mind:
>>> > > > >
>>> > > > > 1. We have all the power of docker and docker-compose from
>>> ursabot
>>> > > through
>>> > > > >    docker-py, and we can easily keep the container running by
>>> simply
>>> > > not
>>> > > > >    stopping it [3]. Configuring the locally running buildbot to
>>> keep
>>> > > the
>>> > > > >    containers running after a failure seems quite easy. *It has
>>> the
>>> > > > > advantage
>>> > > > >    that all of the buildsteps preceding one are already
>>> executed, so it
>>> > > > >    requires less manual intervention.
>>> > > > >    This could be done on the web UI or even from the CLI, like
>>> > > > >    `ursabot reproduce <builder-name>`
>>> > > > > 2. Generate the docker-compose.yaml and required scripts from the
>>> > > Ursabot
>>> > > > >    builder configurations, including the shell scripts.
>>> > > > > 3. Generate a set of commands to reproduce the failure without
>>> (even
>>> > > asking
>>> > > > >    the comment bot "how to reproduce the failing one"). The
>>> response
>>> > > would
>>> > > > >    look similar to:
>>> > > > >    ```bash
>>> > > > >    $ docker pull <image>
>>> > > > >    $ docker run -it <image> bash
>>> > > > >    # cmd1
>>> > > > >    # cmd2
>>> > > > >    # <- error occurs here ->
>>> > > > >    ```
>>> > > > >
>>> > > > > TL;DR
>>> > > > > -----
>>> > > > > In the first iteration I'd remove the travis configurations.
>>> > > > > In the second iteration I'd develop a feature for ursabot to
>>> make local
>>> > > > > reproducibility possible.
>>> > > > >
>>> > > > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
>>> > > > > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
>>> > > > > [3]:
>>> > > > >
>>> > >
>>> https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343
>>> > > > >
>>> > > > > * A local tool to run any Linux-based builds locally using
>>> Docker at
>>> > > > > > the command line, so that CI behavior can be exactly reproduced
>>> > > > > > locally
>>> > > > > >
>>> > > > > > Does that seem achievable?
>>> > > > > >
>>> > > > > Thanks,
>>> > > > > > Wes
>>> > > > > >
>>> > > > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
>>> > > > > > <sz...@gmail.com> wrote:
>>> > > > > > >
>>> > > > > > > Hi All,
>>> > > > > > >
>>> > > > > > > Ursabot works pretty well so far, and the CI feedback times
>>> have
>>> > > become
>>> > > > > > > even better* after enabling the docker volume caches, the
>>> > > development
>>> > > > > > > and maintenance of it is still not available for the whole
>>> Arrow
>>> > > > > > community.
>>> > > > > > >
>>> > > > > > > While it wasn't straightforward I've managed to separate to
>>> source
>>> > > code
>>> > > > > > > required to configure the Arrow builders into a separate
>>> > > directory, which
>>> > > > > > > eventually can be donated to Arrow.
>>> > > > > > > The README is under construction, but the code is available
>>> here
>>> > > [1].
>>> > > > > > >
>>> > > > > > > Until this codebase is not governed by the Arrow community,
>>> > > > > > > decommissioning slow travis builds is not possible, so the
>>> overall
>>> > > CI
>>> > > > > > times
>>> > > > > > > required to merge a PR will remain high.
>>> > > > > > >
>>> > > > > > > Regards, Krisztian
>>> > > > > > >
>>> > > > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4
>>> minutes
>>> > > > > > > * Python builder times have dropped from ~7-8 minutes to ~3-5
>>> > > minutes
>>> > > > > > > * ARM C++ builder time have dropped from ~19-20 minutes to
>>> ~9-12
>>> > > minutes
>>> > > > > > >
>>> > > > > > > [1]:
>>> > > > > > >
>>> > > > > >
>>> > >
>>> https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
>>> > > > > >
>>> > >
>>>
>>

Re: Ursabot configuration within Arrow

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Thu, Aug 8, 2019 at 4:24 PM Antoine Pitrou <an...@python.org> wrote:

>
> Le 08/08/2019 à 16:12, Krisztián Szűcs a écrit :
> > Hi All!
> >
> > Ursabot now supports debugging failed builds with attach attaching
> > shells to the still builds right after a failing build step:
> >
> > $ ursabot project build --attach-on-failure `AMD64 Conda C++`
> >
> > And local source/git directories can also be mounted to the builder
> > instead of cloning arrow, this makes the debugging a lot easier:
> >
> > $ ursabot project build -s ~/Workspace/arrow:. 'AMD64 Conda C++'
> >
> > Mount destination `.` is relative to the build directory on the workers.
> >
> > The CI configuration for arrow is available here:
> > https://github.com/ursa-labs/ursabot/tree/master/projects/arrow
>
> As I've already said: most build configuration should *not* be in the
> buildmaster configuration.  Otherwise this forces a unique build
> configuration (for all branches, for all PRs) and this also forces to
> restart the buildmaster when changing the build configuration (which is
> not a good idea).
>
> Compare with Travis-CI or other services:
> - the CI configuration and scripts are local to the Arrow repository
>
This is the plan, to move the arrow configuration into the arrow repository
to be governed by the arrow community.

> - each PR or branch can change the CI configuration without impacting
> other builds
>
We can introduce an automatism for that, but it has security concerns.
Worst case scenario we can run the ursabot builders on any public CI
services, like we actually run the arrow builders on the ursabot repository:
https://travis-ci.org/ursa-labs/ursabot/builds/569364742

> - one can change the CI configuration without having to restart a global
> daemon or service
>
With a self-hosted infrastructure it is not that easy, at least involves
security
concerns. But we can still develop it, if it is a desired feature.

>
> Regards
>
> Antoine.
>

Re: Ursabot configuration within Arrow

Posted by Antoine Pitrou <an...@python.org>.

Le 08/08/2019 à 16:12, Krisztián Szűcs a écrit :
> Hi All!
> 
> Ursabot now supports debugging failed builds with attach attaching
> shells to the still builds right after a failing build step:
> 
> $ ursabot project build --attach-on-failure `AMD64 Conda C++`
> 
> And local source/git directories can also be mounted to the builder
> instead of cloning arrow, this makes the debugging a lot easier:
> 
> $ ursabot project build -s ~/Workspace/arrow:. 'AMD64 Conda C++'
> 
> Mount destination `.` is relative to the build directory on the workers.
> 
> The CI configuration for arrow is available here:
> https://github.com/ursa-labs/ursabot/tree/master/projects/arrow

As I've already said: most build configuration should *not* be in the
buildmaster configuration.  Otherwise this forces a unique build
configuration (for all branches, for all PRs) and this also forces to
restart the buildmaster when changing the build configuration (which is
not a good idea).

Compare with Travis-CI or other services:
- the CI configuration and scripts are local to the Arrow repository
- each PR or branch can change the CI configuration without impacting
other builds
- one can change the CI configuration without having to restart a global
daemon or service

Regards

Antoine.